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ABSTRACT 


This  Final  Contract  Report  contains  papers  presenting  several  useful  steps  toward 

the  creation  of  a  more  scientific  discipline  of  formatted  file  design. 

in  particular,  there  are  papers  on: 

(1)  The  first  extensive,  fundamentally  oriented,  comparison  of  key-to -address 
transforms  utilizing  existing  formatted  files. 

(2)  Formal,  mathematical  descriptions  of  formatted  file  systems  that  are  used 
to  provide^  concepts  and  means  to  deal  with- 

(a)  the  selection  of  indexes; 

(b)  direct  retrieval  on  the  basis  of  multiple  attributes,  and 

(c)  questions  of  storage  and  response  time  efficiency. 

(3)  The  calibration  of  the  FOREM  I  Formatted  File  Organization  Simulation  Model. 

(4)  A  new,  more  powerful  9000  FORTRAN  statement  model  (FOREM  II)  for  simulating 
the  effects  of  complex  file  organizations,  and  machine  configurations  on 
efficiency  and  response  times  in  a  formatted  file  query  and  update  environ¬ 
ment. 
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INTRODUCTION 


The  field  of  Formatted  File  Organization  has  become  increasingly  important  with 
the  advent  of  large  random  access  peripheral  files  and  requirements  for  real-time 
retrieval  and  maintenance.  It  is  the  purpose  of  this  contract  to  provide  new 
techniques,  tools  and  information  which  will  lead  to  a  fundamental  design 
discipiine  for  operational  files.  This  design  discipline  will  be  solidly  based 
on  studies  of  actual  comnuter  hardware  systems,  software  ^/'tems,  and  associated 
user  files. 

The  original  IBM  Research  Division-IBM  Federal  Systems  Division  work  in  the  area 
was  recorded  in  the  Final  Report  for  Rome  Air  Development  Center  Contract 
No.  AF  30 (602) -4G8S .  This  previous  report  presented  for  the  first  time: 

(1)  Detailed  comprehensive  surveys  of  the  processing  and  content  of  four  large 
intelligence  files  in  Unclassified,  implementation- independent  form; 

(2)  New  techniques  for  the  organization  of  formatted  files  for  direct  multiple 
attribute  retrieval; 

(3)  A  file  organization  evaluation  model  (FOREM  1)  which  accepts  the  survey 
material  in  all  its  detail  with  respect  to  file  content  and  transactions 
(complex  queries  and  updates)  and  allows  the  user  to  evaluate  the  effects 
of  different  file  organizations  on  system  efficiency  and  response  time. 

This  above  effort  provided  a  basis  for  dealing  not  just  with  abstract  theories, 
but  also  with  existing  files  and  hardware-software  systems.  As  the  result  of  a 
further  contract  No.  AF  30602-69-C-0100,  a  prototype  file  design  handbook  was 
created  using  information  obtained  from  thousands  of  actual  computer  system  and 
simulation  model  (primarily  FOREM  I)  runs.  This  handbook  represents  a  first 
effort  to  create  design  guidelines  based  on  extensive  empirical  data. 

The  present  Final  Contract  Report  represents  another  step  forward  in  the  creation 
of  empirical  information,  techniques,  and  practical  tools  for  file  design.  Here 
again,  the  philosophy  is  that  empirical  and  abstract  contributions  are  necessary 
for  the  eventual  creation  of  a  science  of  file  design,  but  that  the  abstract  eon- 
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tributions  must  be  solidly  connected  with  practical  understanding  of  actual 
systems,  The  work  to  be  reported  covers  three  areas. 

The  initial  area  continues  the  study  of  actual  files.  It  is  represented  by  a 
paper  on  a  fundamentally  oriented,  comparative  study  of  key-to-address  trans¬ 
forms  using  actual  key  sets.  This  paper  makes  a  significant  departure  from  exist¬ 
ing  literature  on  key-to-address  transforms;  it  adds  no  new  transform  proposal 
tn  th*>  nw;i(  existing,  unevaluated,  uncompared  transforms;  instead,  on  the  basis 
of  actual  runs,  it  presents  a  number  of  useful  facts  and  guidelines  for  selecting 
a  transform  appropriate  to  the  user's  key  set.  In  its  conclusions  section,  it 
goes  further  to  propose  and  discuss  more  fundamental  techniques  for  selecting 
transforms  on  the  basis  of  defined  characterizations  of  both  transforms  and  key 
sets . 

The  second  area  continues  work  on  creating  a  fundamental  basis  for  file  design. 

In  this  area,  two  papers  present  formal,  mathematical  descriptions  of  certain 
aspects  of  formatted  file  organizations.  These  descriptions  are  then  used 
to  provide  means  and  concepts  for  dealing  with  the  selection  of  indexes  and  the 
questions  of  storage  and  response  time  efficiency.  A  third  paper  extends  the 
power  of  prior  work  on  direct,  multiattribute  retrieval. 

The  final  area  is  concerned  with  the  calibration  of  the  FOREM  I  model  and  the 
creation  and  description  of  a  new,  significantly  more  powerful  program  for 
modeling  formatted  file  organizations  (FOREM  II).  This  model,  which  is  a  9000 
FORTRAN  statement  program,  provides  expanded  capability  in  aimost  all  areas  over 
FOREM  i;  the  most  important  area,  however,  is  the  ability  to  deal  with  simultaneous 
operations  within  a  single  program,  with  much  more  complex  machine  configurations, 
and  with  a  wider  variety  of  query  formulations. 

The  model  is  described  in  a  final  paper  and  by  an  included  copy  of  the  user's 
guide  documentation. 

In  summary,  this  Final  Contract  Report  contains  papers  presenting  several  useful 
steps  toward  the  creation  of  a  more  scientific  discipline  of  formatted  file 
design. 
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KEY-TO-ADDRESS  TRANSFORM  TECHNIQUES: 

A  FUNDAMENTAL  PERFORMANCE  STUDY  ON  LARGE  EXISTING  FORMATTED  FILES 


by 


V ,  Y .  Lum 
P.  S.  T.  Yuen 
M.  Dodd 


Information  Sciences  Department 
IBM  Research  Labo:  aiory 
San  Jose,  California 


ABSTRACT:  This  paper  presents  the  results  of  a  study  of  eight  different 
key-to-address  transformation  methods  applied  to  a  set  of  existing  files.  As 
each  method  is  applied  to  a  particular  file,  load  factor  and  bucket  size  are 
varied  over  a  wide  range.  In  addition,  appropriate  variables  pertinent  only 
to  a  specific  method  also  take  on  different  values.  The  performance  of  each 
method  is  summarized  in  terms  of  the  number  of  accesses  required  to  get  to  a 
record  and  the  number  of  overflow  records  created  by  a  transformation.  Pecu¬ 
liarities  of  each  method  are  discussed.  Practical  guidelines  obtained  from 
the  results  are  stated.  Finally,  a  proposal  for  further  quantitative  funda¬ 
mental  study  is  outlined  in  the  conclusion. 
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INTRODUCTION 


The  direct  access  method  normally  provides  the  most  rapid  means  of  accessing  a 
single  record  of  a  formatted  file.  In  cases  where  there  is  one  record  or  nearly 
one  record  per  possible  key  value,  access  requires  only  a  multiplication  of  the 
key  by  the  record  length  to  obtain  the  address  of  the  desired  record.  The 
record  can  then  be  obtained  by  only  one  access  to  the  peripheral  file.  When 
there  is  less  than  one  record  for  every  two  or  more  possible  key  values,  then 
the  direct  multiplication  transform  will  leave  a  considerable  amount  of  empty 
space  for  key  values  that  are  unused.  To  reduce  the  amount  of  waste  space, 
numerous  workers  have  proposed  a  means  for  mapping  large  key  spaces  into  smaller 
address  spaces.  The  main  problem  is  that  none  of  the  practical  instances  of  these 
key-to-address  transforms  can  guarantee  to  produce  a  uniform  mapping  of  keys  to 
addresses  for  any  arbitrary  distribution  of  key  values.  Given  this  situation, 
one  needs  guidance  on  the  selection  of  a  technique  that  will  produce  the  most 
nearly  uniform  distribution  for  his  practical  situation. 

Unfortunately,  up  to  this  time,  workers  in  the  field  have  devoted  their  efforts 
toward  inventing  new  transforms  rather  than  toward  creating  guidelines  by 
comparing  existing  ones  in  practical  situations  so  that  their  relative  perfor¬ 
mance  can  be  characterized,  The  only  comparative  evaluation  known  to  the 
authors  appears  in  Blichholz.*  He  presents  an  excellent  discussion  of  various 
aspects  or  key-to-address  transformation,  but  his  experimental  results  are 
minimal.  In  this  paper,  we  undertake  to  provide  a  major  experimental,  compara¬ 
tive  evaluation  of  several  transform  techniques  and  have  obtained  several 
pragmatic  user  guidelines  for  the  selection  of  an  appropriate  practical  transform. 

Based  on  this  information,  we  also  discuss  in  the  conclusions  section  a  possibly 
more  quantitative,  fundamental  approach  to  transform  selection.  In  particular, 
we  seek  to  define  two  sets  of  characterization  functions  which  may  be  applied 
to  key  sets  and  to  transforms.  If  the  characterization  functions  are  valid, 
then  we  should  be  able  to  make  quantitative  comparisons  between  the  characteriza¬ 
tion  function  values  for  a  particular  key  set  and  the  characterization  function 
values  for  proposed  transforms  to  decide  which  transform  is  likely  to  perform 
best. 
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EXPERIMENTAL  ENVIRONMENT 


There  are  many  factors  affecting  the  performance  of  key-to-address  transform 
techniques.  This  section  contains  a  discussion  of  the  dominant  parameters 
considered  in  the  present  experiments.  The  topics  presented  include: 

(1)  the  key  set  sample 

(2)  the  transformation  method 

(3)  the  variable  parameters 

(4)  the  method  of  handling  overflows  or  clashes. 


The  Sample  Key  Sets 

Characteristics  of  the  eight  files  used  in  this  experiment  are  shown  in  Table  I. 
The  eight  key  sets  contain  files  of  different  sizes  with  a  variety  of  key  types. 
Some  keys  are  long,  some  are  short,  some  alphanumeric,  and  some  numeric.  In 
addition,  some  files  have  keys  densely  distributed  in  the  key  space,  and  some 
sparsely  distributed.  Because  of  its  diversity,  it  is  believed  that  this  sample 
is  representative  of  the  general  range  of  files.  The  selected  transformation 
methods  will  be  applied  to  each  of  these  files. 

Variables 

12  5 

As  mentioned  in  many  articles,  ’  ’  the  two  dominant  variables  affecting 
performance  are  loading  factor  and  bucket  size.  The  former  is  the  ratio  of 
the  number  of  records  to  the  number  of  record  slots.  (A  slot  is  a  unit  of 
storage  space  that  can  hold  one  record.)  The  latter  is  the  number  of  records 
that  can  be  accommodated  in  an  Image  under  a  transformation.  A  decrease  in  the 
loading  factor  reduces  the  probability  that  many  records  will  be  mapped  to  the 
same  location  and  an  increase  in  the  bucket  size  increases  the  capacity  of  each 
image.  Both  will  tend  to  reduce  the  number  of  overflow  records.  In  this  experi¬ 
ment,  the  loading  factor  is  varied  from  0.5  to  0.95  at  intervals  of  0.05.  For 
each  choice  of  the  loading  factor,  the  bucket  size  takes  on  the  values  of  1,  2, 

5,  10,  20  and  50. 


1-4 


Number 
of  Records 

Type* 

Key  Length 
(in  no.  of  symbols) 

County  State  Code 

3072 

N 

5 

Personnel 

2241 

N 

6 

Personnel  Location 

930 

7 

Applicants 

762 

6 

Customer  Code 

24050 

6 

Product  Code 

33575 

6 

Library 

4909 

Ml 

12 

Random  Numbers 

500 

_ _ 

10 

*A  =  alpha,  N  =  numeric,  AN  =  alphanumeric 
**A  symbol  can  be  a  digit,  letter,  etc. 


TABLE  I 
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There  are  other  variables  pertinent  only  to  a  specific  transformation.  They 
will  be  described  in  the  discussion  of  the  appropriate  methods. 

Transformation  Methods 

Six  different  techniques,  producing  eight  different  transformation  methods  have 
been  studied.  Each  method  transforms  the  keys  into  addresses  with  bucket  size 
equal  to  1 .  In  the  case  of  a  larger  bucket  size,  the  bucket  addresses  are 
determined  by  a  modulo  B  operation,  where  B  is  the  number  of  buckets  avail¬ 
able.  An  alternative  is  to  map  the  key  directly  into  bucket  addresses  in  one 
process.  However,  it  was  found  from  the  tests  made  that  the  alternative  showed 
no  significant  difference  from  the  first  approach.  Because  of  the  way  some  of 
the  methods  operate,  the  modulo  operation  cannot  be  eliminated  from  these  methods. 
Consequently,  it  was  decided  that  the  first  approach  would  be  used  throughout 
the  experiment. 

(i)  Division  -  Undoubtedly,  the  best  known  and  most  frequently  used 
technique  is  division  of  the  key  by  a  positive  integer,  particularly  a  prime  number. 
In  this  method,  the  remainder  obtained  from  the  division  becomes  the  address 

for  the  key.  The  divisor,  q,  is  usually  chosen  to  be  approximately  equal  to 
the  number  of  available  addresses,  M.  Buchholz*  suggested  a  refinement  that  q 
be  the  largest  prime  number  smaller  than  M.  The  utility  of  his  suggestion  is 
not  so  obvious.  Given  that  a  key  distribution  contains  clusters  of  various 
sizes  at  random  with  gaps  of  different  lengths  also  at  random,  it  may  be  that 
the  choice  of  any  q  equal  to  or  near  M  will  perform  just  as  well.  One  set 
of  experiments  was  performed  to  check  the  truth  of  this  conjecture. 

(ii)  Digit  Analysis  -  In  this  method,  the  distribution  of  values  of  the 
keys  in  each  position  or  digit  (where  digit  is  not  necessarily  a  decimal  digit) 
is  determined.  Those  positions  having  the  most  skewed  distributions  will  be 
deleted  from  the  key  until  the  number  of  remaining  digits  is  equal  to  the 
desired  address  length,  which  is  the  number  of  digits  in  the  highest  slot  number. 

The  criteria  adopted  to  find  the  digits  to  be  used  as  addresses,  based  on  the 
measure  of  uniformity  in  the  distribution  of  values  in  each  digit,  is  to  keep 
those  positions  having  no  abnormally  high  peaks  or  valleys  and  those  having 
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small  standard  deviations.  In  a  given  file,  the  same  digits  must  be  dropped  from 
all  keys. 


Digit  analysis  is  the  only  method  investigated  that  exploits  key  distribution  and 
it  is  only  a  partial  exploitation.  There  do  exist  "perfect"  transformations  which 
exploit  knowledge  of  the  key  distributions  to  produce  perfectly  uniform  distribu¬ 
tions  of  addresses.  These  transformations  generally  require  extremely  extensive 
manipulations  of  the  key  sets  and  are  not  practical  for  data  bases  that  receive 
even  a  single  new  update  record.  Our  study  is  therefore  confined  to  the  practical 
"non-perfect"  transformations  that  cannot  guarantee  perfectly  uniform  distribu¬ 
tions  for  arbitrary  key  sets. 


(iii)  Mid-Square  -  A  key  is  multiplied  by  itself  and  its  address  is 
obtained  by  truncating  digits  at  both  ends  of  the  product  until  the  number  of 
digits  left  is  equal  to  the  desired  address  length.  As  in  the  digit  analysis 
method,  the  same  positions  must  be  kept  from  all  products. 


(iv)  Folding  -  A  key  is  partitioned  into  a  number  of  parts  each  of  which, 
except  the  last,  has  the  same  length  as  the  address  length.  (There  are  methods 
which  partition  a  key  into  shorter  parts.  These  methods  have  not  been  investi¬ 
gated  here  because  it  is  believed  that  their  characteristics  are  about  the  same 
as  the  ones  studied.)  Two  methods  have  been  investigated.  One  folds  the  key 
at  the  boundary  of  the  parts  as  if  folding  paper.  Digits  falling  into  the  same 
position  will  be  added  together.  The  other  method  is  to  shift  over  the  sections 
so  that  the  lower  ends  of  the  sections  align  before  carrying  out  the  addition 
These  two  methods  will  be  referred  to  as  fold-boundary  and  fold-shifting, 
respectively.  In  either  case,  decimal  addition  is  used.  Figure  1  below 
illustrates  the  positional  manipulation  of  th-e  two  methods. 
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(v)  Lin’s  Method  -  In  this  method,  a  key  is  expressed  in  radix  p  and 

the  result  taken  modulo  where  p  and  q  are  relatively  prime  and  m  is  a 

positive  integer.  Given  a  key,  it  is  first  written  as  a  simple  binary  bit  string, 
ihese  bits  are  then  grouped  to  form  p-nary  digits.  The  result  is  expressed  as  a 
decimal  number  which,  taken  modulo  qm,  gives  the  address.  To  simplify  the  selec¬ 
tion  of  p,  q,  and  w,  it  was  decided  to  have  p  =  q  *  1,  and  p  and  m  are  so 

chosen  that  qm  approximates  the  number  of  addresses  available. 

Let  us  illustrate  the  process  with  an  example.  Suppose  that  the  key  is  975. 
Encoding  each  of  the  digits  with  4  bits  (the  smallest  number  of  bits  required  to 
represent  a  decimal  digit),  we  have  a  binary  string  of  100101110101 .  Now,  if 
the  number  of  addresses  available  is  48,  the  choice  of  p  =  8,  q  =  7,  and  m  *  2 
conforms  to  the  rule  defined.  Grouping  three  bits  together,  we  have 
4565 ) „  *  2421). The  address,  then,  is  given  as  20  by  taking  2421  modulo  49. 
(Note  that  if  p  had  been  10,  then  the  address  would  have  been  obtained  simply 
by  taking  the  key  modulo  q"1,  i.e.,  same  as  the  division  method.)  Note  that  there 

are  different  ways  to  express  a  key  in  a  binary  vector,  e.g.,  BCD  or  binary. 

Our  investigation  showed  that  the  results  do  not  vary  significantly  for  these 
cases . 

The  details  in  this  mapping  may  not  be  exactly  as  proposed  by  Lin,^  but  the 
principle  remains  the  same. 

(vi)  Algebraic  Coding  -  Each  digit  of  a  key  is  considered  to  be  a  poly¬ 
nomial  coefficient.  The  polynomial  so  obtained  is  divided  oy  another  polynomial 

g(x)  which  is  invariant  for  all  the  keys  in  a  set.  The  coefficients  of  the 
remainder  polynomial  form  the  address. 

3  4  12 

This  method,  based  on  the  theory  of  error  correcting  codes,  *  ’  assures  that 

if  g (x)  is  chosen  in  such  a  manner  that  all  polynomials  containing  g(x)  as  a 

factor  have  a  minimum  weight  or  distance  (Hamming  distance)  of  d,  then  no  two 

keys  differing  by  d  or  less  positions  can  '  n  mapped  to  the  same  address. 

Application  of  this  theory  requires  the  coet indents  of  g(x)  and  k(x),  the 

division  and  the  polynomial  from  the  key,  respectively,  to  be  elements  of  a 
12 

Galois  field  of  q  elements,  GF(q),  where  q  is  a  power  of  a  prime.  Thus, 
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a  decimal  key  must  have  as  its  components  elements  in  Gf(q)  with  q  >  10. 
Alternatively,  one  may  expand  the  key  into  a  vector  or  a  string  of  elements 
with  the  value  of  each  element  smaller  than  10.  Two  Galois  fields,  GF(2) 
and  GF(16)  have  been  chosen  more  or  less  arbitrarily  for  this  study. 

The  selection  of  g(x)  was  done  in  two  ways:  (1)  selecting  g(x)  on  the 

basis  of  distance,  and  (2)  selecting  g(x)  randomly  with  the  restriction  that 

neither  the  highest  degree  coefficient  nor  the  constant  term  is  zero.  The 

degree  of  g(x),  of  course,  is  determined  from  the  size  of  the  storage  available 

so  that  the  remainder  can  cover  the  range  of  addresses.  More  precise  and 

12  3 

detailed  discussions  on  this  method  can  be  found  in  Peterson,  Schay,  and 

4 

Ha  nan. 

In  addition  to  mapping  a  set  of  keys  into  addresses  using  one  of  the  methods 
discussed,  it  is  also  possible  to  create  a  transformation  with  a  combination  of 
two  or  more  methods.  For  example,  one  may  first  multiply  a  key  by  itself  and 
the  product  is  then  folded  to  form  an  address.  Here,  mid-square  and  folding  are 
used  in  conjunction.  Lin's  method  is  essentially  a  combination  of  two  basic 
methods:  radix  transformation  and  division.  (Actually,  nearly  all  transforma¬ 
tion  methods  require  the  application  of  the  division  method  to  find  the  bucket 
addresses.)  In  this  experiment,  combining  methods  will  not  be  studied  because 
it  is  believed  that  the  characteristics  of  the  individual  methods  determine  the 
characteristics  of  a  combination. 

Alphanumeric  Keys 

Since  scale  of  the  key  sets  are  alphabetic  or  alphanumeric  and  since  nearly  all 
the  transformation  methods  operate  only  on  numerical  values,  it  becomes  necessary 
to  encode  the  alphabetic  or  alphanumeric  keys  as  numeric  keys.  Several  different 
encoding  schemes  have  been  tried.  No  significant  variation  in  performance  was 
discovered  provided  the  encoding  schemes  preserve  distinctness  of  the  symbols. 

The  scheme  finally  selected  and  used  throughout  the  experiment  is  to  encode  the 
letters  a,  b,  c,  ....  z  into  decimal  numbers  11,  12,  36.  Numerical 

digits  remain  unchanged.  It  is  understood  that  key  length  refers  to  the  number 
of  digits  after  encoding. 
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Overflow  Storage 


In  general,  the  available  memory  or  address  space  is  divided  into  small  sections 
called  buckets.  Every  record  will  be  mapped  into  one  of  these  buckets.  Since 
non-perfect  transformation  techniques  may  map  an  excessive  number  of  records  into 
the  same  bucket,  methods  must  be  devised  to  handle  overflow  records  which  cannot 
be  accommodated  by  their  home  addresses. 

The  two  basic  techniques  commonly  adopted  to  accommodate  overflow  records  are: 

(1)  storing  them  in  vacancies  in  another  bucket  in  the  prime  area,  and 

(2)  storing  them  in  a  separate  or  independent  overflow  area.  Many  variations 
are  possible  in  each  basic  technique.  For  example,  one  version  of  the  first 
technique  is  the  search  for  vacancies  successively  starting  from  a  record's 
home  bucket.  The  process  continues  until  an  accommodation  is  found.  (Storage 
space  is  considered  to  be  circular  and  the  amount  of  storage  space  must  be  large 
enough  to  hold  all  the  records.)  This  technique,  proposed  by  Peterson,  is 
usually  called  the  open  addressing  or  consecutive  spill  method.  A  variation  of 
this  method  is  to  search  for  space,  whenever  a  record's  home  bucket  is  filled, 
by  skipping  a  number  of  buckets  as  defined  by  a  selected  rule.^’15’^  When  this 
skipping  technique  is  used,  one  should  select  a  rule  such  that  the  entire  storage 
space  can  be  searched  when  necessary. 

A  basic  version  of  the  separate  overflow  method  is  chaining.  Here,  the  location 
of  the  first  overflow  record  from  each  bucket  is  listed  in  the  record's  home 
bucket.  Pointers  are  stored  in  each  successive  overflow  record  in  the  chain 
to  indicate  the  address  of  the  next  record.  A  variation  of  the  separate  over¬ 
flow  technique  is  to  provide  small  areas,  each  of  which  can  only  be  used  to  store 
overflow  records  from  a  particular  section  of  the  prime  area.  Overflow  records 
that  cannot  be  accommodated  here  go  to  a  larger,  independent  area  available  to 
all  buckets. 

Our  study  will  be  limited  to  the  basic  techniques  of  open  addressing  and  of 
chaining  in  separate  overflow  areas. 
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Performance  Measure 

In  order  co  compare  the  performance  of  various  transformation  techniques,  a 
standard  of  measurement  must  be  established.  After  an  investigation  of  various 
approaches,  the  average  number  of  accesses  per  record  and  the  number  of  overflow 
records  were  fount!  to  be  the  most  appropriate  performance  indicators. 

In  open  addressing,  the  number  of  accesses  for  a  given  record  is  equal  to  S  +  1, 
where  S  is  the  number  of  buckets  away  from  that  record's  home  bucket.  In 
chaining,  each  record  located  in  its  home  bucket  is  said  to  require  one  access. 

A  given  overflow  record  is  said  to  need  T  +  1  accesses,  where  T  is  its  chain 
position.  For  each  transformation,  the  average  number  of  accesses  per  record 
and  the  percentage  of  overflow  records  for  each  key  set  have  been  calculated. 
Further  avt  jjes  over  the  entire  eight  key  sets  were  then  computed  . 

EMPIRICAL  RESULTS  AND  DISCUSSION 

Tables  II  to  XXI  present  the  results  of  the  study.  Summarized  in  the  tables  are 
the  average  accesses  per  record  for  the  two  different  overflow  storage  techniques, 
the  average  percentage  of  overflows  for  each  transformation,  and  the  standard 
errors.  It  can  be  seen  from  these  tables  that  t»e  division  technique  gives  the 
best  overall  performance  and  that  the  mid-square  technique  is  a  close  second.  In 
fact,  the  mid-square  method  has  the  lowest  number  of  accesses  per  record  for 
open  addressing  and  loading  factors  below  0.75.  The  mid-square  method  also 
provides  the  most  consistent  performance  as  evidenced  by  the  small  standard  error 
for  the  various  key  distributions.  Among  the  other  methods,  the  algebraic 
technique  is  good  when  chaining  of  overflow  records  is  used.  Lin's  method  is 
consistently  poor;  folding  and  digit  analyses  are  erratic. 

All  the  transformation  technques  display  the  same  performance  trend;  namely,  the 
number  of  accesses  per  record  and  the  percentage  of  overflow  records  increases 
with  higher  loading  factor  and  decreases  with  larger  bucket  size.  The  changes 
are  gradual  when  chaining  overflow  is  used,  but  they  are  very  drastic  for  open 
addressing  with  small  bucket  size,  i.e,,  1,  or  2.  Indeed,  open  addressing 
performance  for  small  bucket  sizes  is  so  erratic  that,  even  with  a  loading  factor 
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of  only  0.5,  there  are  cases  which  require  more  than  800  accesses  on  the  average 
to  retrieve  a  record.  From  the  results  obtained,  one  can  obviously  conclude  that 
open  addressing  should  not  be  used  for  small  bucket  sizes. 

The  results  for  open  addressing  with  small  bucket  sites  are  much  worse  than  those 

5 

obtained  in  Peterson's  experiment.  The  discrepancy  undoubtedly  is  caused  by 
Peterson's  idealized  assumption  that  a  Poisson  distribution  of  addresses  would 
result  from  the  transformations. 

When  bucket  size  becomes  larger,  open  addressing  improves  rapidly.  At  20  or  50, 
it  outperforms  chaining  in  general.  However,  because  of  the  small  number  of 
overflow  records  at  these  bucket  sizes,  the  difference  is  very  slight.  Hence, 
the  use  of  either  technique  will  be  equally  satisfactory. 

When  tabulating  the  results  of  the  transformation  methods  on  the  key  sets  before 
averages  are  taken,  it  was  found  that  no  mapping  method  is  consistently  the  best. 
For  example,  the  two  methods  using  the  folding  technique  are  excellent  in  some  of 
the  files  but  because  of  one  or  two  pool  results  [not  necessarily  the  same  ones 
in  the  two  different  kinds  of  folding),  degradation  in  performance  occurs  after 
averaging.  The  same  phenomenon  occurs  for  all  transformation  methods.  If, 
before  averaging,  one  or  two  of  the  poor  results  are  removed  from  the  data  of 
each  transformation,  then  nearly  all  the  techniques  will  show  about  the  same 
performance. 

Every  method  of  transformation  has  its  idiosyncrasies.  Let  us  briefly  discuss 
each. 

Consider  first  the  method  of  simple  division.  As  mentioned  before,  the  keys  are 
believed  to  be  distributed  in  clusters  of  various  sizes  separated  by  gaps  of 
different  lengths.  If  this  assumption  is  correct,  the  choice  of  a  divisor, 
becomes  immaterial.  An  experiment  was  designed  to  shed  some  light  on  the  subject. 
Prime,  odd  but  not  prime,  and  even  numbers  have  been  chosen  as  divisors. 

In  each  of  the  three  categories  tested,  an  abrupt  change  in  performance  sometimes 
occurs  for  a  small  variation  in  loading  factor  and/or  bucket  size.  The  frequency 
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of  occurrence  of  this  behavior  is  less  than  2%  of  all  cases  tested  for  the  prime 
divisors,  about  2%  for  the  odd  divisors  and  about  10%  for  the  even  divisors. 

The  abruptness  is  much  more  pronounced  in  the  case  of  even  divisors.  However, 
most  of  the  ten  percent  occurs  in  the  results  of  two  of  the  key  sets.  Detailed 
investigation  revealed  that  each  of  the  key  sets  giving  poor  results  with  even 
divisors  has  a  preponderance  of  odd  numbers.  Since  evenness  and  oddness  are 
preserved  after  division  by  an  even  number,  a  skew  distribution  of  addresses 
exists  after  transformation  and  poor  performance  therefore  arises,  In  general, 
if  a  large  number  of  keys  are  equal  modulo  d  and  if  D  is  a  multiple  of  d, 
then  the  use  of  D  as  a  divisor  will  result  in  poor  performance.  This  is  the 
basic  argument,  as  suggested  by  Buchholz,  that  the  largest  prime  number  close  to 
but  less  than  the  size  of  '.he  address  space  should  be  selected  as  the  divisor. 
However,  it  appears  that  most  r.on-prime  numbers  are  valid  candidates  for 
divisors  since  inferior  results  are  relatively  rare,  and  since  they  often  out¬ 
perform  the  prime  numbers.  It  is  advisable,  nevertheless,  to  choose  divisors 
which  do  not  contain  small  prime  numbers  as  factors.  This,  of  course,  eliminates 
even  numbers. 

In  Lin's  method,  the  choice  of  a  slightly  different  p  can  change  the  results 
drastically.  The  reason  for  this  is  not  known.  Lin  showed  in  his  experiment 
that  his  technique  produced  addresses  close  to  a  Poisson  distribution.  Our  data 
also  tend  to  substantiate  Lin's  claim.  However,  as  Buchholz  believes,  perfect 
randomization  (Poisson  distribution  of  addresses)  is  not  a  desirable  goal.  Our 
experimental  results  confirm  his  belief.  All  transformations  (none  of  which 
produce  perfect  randomization)  give  better  performance  than  true  randomization,** 

Transformation  by  digit  analysis  is  not  recommended.  Even  with  the  additional 
overhead  imposed  by  the  analysis,  the  results  were  not  satisfactory.  Truncation 
by  observation  may  eliminate  the  analysis  but  it  is  also  not  very  reliable. 

The  mid-square  transformation  technique,  as  mentioned  before,  can  be  applied  with 
some  confidence  that  fairly  good  results  will  be  obtained.  Although  it  is  not 
apparent  in  our  tabulated  results,  this  technique  can  also  produce  unexpectedly 
poor  performances.  For  long  keys  and  short  addresses,  and  if  the  middle  digits 
of  the  keys  vary  little,  a  large  number  of  distinct  keys  will  -ail  be  mapped  to 
the  same  bucket. 
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It  is  easiest  to  compute  an  address  by  the  method  of  folding  Keys  when  the  key 
length  is  long  and  the  desired  address  fits  into  one  computer  word.  The 
operations,  shifting  and  adding,  are  much  easier  to  carry  out  than  the  operations 
associated  with  other  techniques.  For  key  length  nearly  the  same  as  address 
length,  this  method  behaves  like  the  division  method. 

By  far  the  most  complicated  method  is  the  algebraic  coding  technique.  Here 

even  the  choice  of  a  generating  polynomial  to  guarantee  a  minimum  distance  in  the 

code  is  not  an  easy  task.  Experiments  with  codes  of  various  minimum  distances 

in  GF(2)  have  been  applied  to  test  the  claims  of  advantages  given  by  the 

3  4 

proponents  of  this  method.  *  They  believed  that  large  minimum  distances  of 
a  code  assure  good  performance.  The  data  obtained  do  not  substantiate  their 
assertion  since  there  does  not  seem  to  exist  any  correlation  between  performance 
and  the  distance  of  a  code.  Neither  larger  nor  smaller  distances  produce  uni¬ 
formly  better  results.  Codes  chosen  at  random  consistently  perform  equally  well. 
The  choice  of  the  Galois  field  also  does  not  seem  to  be  important.  As  shown  in 
the  tables,  the  performance  figures  are  nearly  the  same  for  both  fields  used 
in  the  experiment. 

CONCLUSIONS 

Pragmatic  Choice  of  Transforms 


Faced  with  an  arbitrary  key  set,  the  selection  of  a  transformation  technique  is 
obvious;  the  division  method  is  preferred.  While  other  techniques  may  sometimes 
perform  better,  one  also  risks  obtaining  inferior  results  more  often.  In  the 
division  method,  the  choice  of  a  divisor  is  not  necessarily  limited  to  prime 
numbers.  Selecting  a  number  which  does  not  contain  any  prime  factor  below, 
say,  20  is  probably  sufficient  to  assure  good,  performance. 

Overflow  Handling 

The  overflow  handling  technique  to  be  used  depends  on  bucket  size.  If  the  bucket 
size  is  less  than  10  records,  open  addressing  should  not  be  employed.  For  larger 
sizes,  this  technique  can  be  applied  to  save  storage  space  and  yet  maintain 
good  performance.  Chained  overflow  handling,  however,  generally  gives  a  much 
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more  predictable  result  because  overflow  records  do  not  affect  prime  storage  space. 
However,  in  determining  which  overflow  handling  technique  is  superior,  one  must 
take  into  account  the  characteristics  of  the  storage  device  and  the  operational 
system.  por  example,  if  disks  are  used,  arm  motion  must  be  analyzed  with  the 
number  of  accesses  given  in  the  experiment.  If  chained  rverflow  requires  a 
large  number  of  arm  movements,  then  it  may  become  impractical.  On  the  other  hand, 
if  the  system  misses  rotations  when  accessing  successive  buckets,  open  addressing 
may  become  just  as  expensive. 

Static  and  Dynamic  Data  Sets 

Although  the  study  here  has  been  limited  to  static  key  sets,  it  is  believed  that 
the  data  obtained  are  applicable  to  dynamic  situations  where  keys  can  be  deleted 
or  added.  From  the  results  of  Olson, it  can  be  seen  that  the  difference 
between  a  dynamic  situation  and  its  static  analog  (initial  loading  in  Olson)  is 
relatively  very  small  when  compared  to  the  deviations  produced  by  the  transforma¬ 
tions  themselves.  Consequently,  the  results  here  can  still  be  used  as  a  guide 
in  both  situations . 

Characterization  Functions  for  Transformations  and  key  Sets 

The  previous  discussions  are  pragmatic  statements  based  on  the  results  of  our 
study.  This  study  has  led  us  to  a  conclusion  that  it  is  desirable  to  create  a 
more  quantitative  fundamental  approach. 

As  mentioned  earlier,  a  comparison  of  the  data  obtained  in  this  study  and  that 
based  on  the  Poisson  distribution  of  addresses  indicates  that  the  idea  of 
finding  a  transformation  technique  that  will  "randomize"  a  key  set  is  a  mis¬ 
construed  objective.  This  misconception  is  often  the  result  of  the  belief  that 
a  "randomizing”  transformation  will  map  the  keys  into  evenly  distributed  addresses. 
Actually,  an  ideal  transformation  method  must  map  all  keys  in  a  file  to  distinct 
addresses.  Uniformity  in  the  distribution  of  addresses  is  not  synonymous  with 
the  mapping  of  a  key  into  addresses  with  equal  probability.  Consequently,  as 
believed  by  Buchholz*  and  substantiated  by  the  results  in  this  paper,  an 
efficient  transformation  method  should  preserve  whatever  uniformity  exists  in 
the  keys. 
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Bused  on  this  information,  we  would  like  to  find  a  set  of  characterization  functions 
for  the  transformations.  We  would  like  to  classify  these  transformation  methods 
with  respect  to  their  capabilities  in  preserving  local  uniformities  in  the  key 
set  or  randomizing  the  keys  into  addresses.  Since  the  distribution  of  a  key  set 
also  plays  an  important  role  in  the  performance  of  a  transformation,  we  would 
also  want  to  define  a  set  of  characterization  functions  for  the  key  sets.  If 
the  characterization  functions  are  meaningfully  selected,  we  should  be  able  to 
determine  which  transformation  is  likely  to  perform  well  on  a  given  set  of  keys. 

Let  us  consider  the  techniques  used  hy  the  transformation  methods.  Generally 
speaking,  all  transformation  methods  may  be  characterized  as  either  distributive 
or  randomizing  according  to  the  manner  in  which  the  addresses  are  generated  from 
the  keys.  A  distributive  transformation  preserves  the  order  of  the  keys  in  the 
resulting  addresses  to  a  large  extent;  a  randomizing  transformation  destroys  the 
order  completely. 

Let  us  define  more  precisely  the  two  terms,  distributive  and  randomizing.  Let 
kQ,  kj,  • • • .  kn  j  be  n  numerically  consecutive  Leys.  Let  0,  l,  2,  ....  n-1 
be  the  range  of  the  mapping,  consisting  of  the  addresses  of  the  available  slots, 
n  is  an  arbitrary  integer  and  is  fairly  large;  for  purposes  of  standardization, 
we  choose  1000.  Since  the  slots  have  distinct  addresses,  these  two  terms  will  be 
used  interchangeably.  A  transformation  T  will  map  each  key  to  one  of  the 
addresses.  Let  us  suppose  that  the  keys  k^,  i  =  0,  1,  ....  n-1  are  transformed 
as  given  by  s^  =  T(k.)  mod  n.  Note  that  the  images  of  the  mapping  are  not 
necessarily  distinct  and  that  does  not  necessarily  follow  in  the 

sequence  of  addresses.  A  set  of  j  +  1  images,  s.,  s.+1>  ...,  si+j>  from  the 

set  of  keys  k.  <  k.  k.  .  is  said  to  be  in  order  if  and  only  if  there 

1  l  i+l  i+j 

exists  an  integer  b  such  that  s.'  <  s.+1'  <  ...  <  +  j  '  or  s.'  >  s^'  >  ... 

>  s.  . '  where  s„ '  =  b  +  s„  mod  n.  The  order  is  said  to  be  destroyed  otherwise, 
l+j  5,  t 

Essentially,  the  above  statement  specifies  that  the  addresses  are  circular  and  that 
the  set  of  images  are  in  order  if  these  images  can  be  arranged  in  one  of  the  two 
ways  just  given  by  cyclic  shifts.  Let  ur.  define  the  order  length  of  a  transforma¬ 
tion  T  to  be  the  integer  m  given  as  follows:  Lot  T  map  the  keys  kQ,  kj,...k 
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into  the  addresses  one  at  a  time,  starting  from  kQ  and  carrying  on  sequentially. 
Let  this  process  be  repeated  many  times  starting  from  different  keys.  On  the 
average  let  the  mth  key  be  the  first  key  that  is  mapped  into  an  address  resulting  in 
a  destruction  of  order.  Then  T  is  said  to  have  an  order  length  equal  to  m. 

T  is  said  to  be  perfectly  distributive  if  m  is  eq’ial  to  n  +  I  .  It  is  said  to 
be  distributive  if  m  is  large  and  randomizing  if  m  is  small. 

Let  us  define  a  second  parameter,  collision  length.  Again,  let  T  map  the  keys 
into  addresses  in  the  same  manner  as  above.  On  the  average  let  the  cth  key  be 
the  first  key  mapped  into  an  address  such  that  the  addresses  of  the  c  keys  just 
obtained  are  no  longer  all  distinct.  (Here  the  order  of  the  key  set  is  not 
necessarily  preserved.)  c  is  said  to  be  the  collision  length  of  T.  By  defini¬ 
tion,  a  perfectly  distributive  transformation  has  a  collision  length  equal  to  n  +  1. 
A  randomizing  transformation  will  give  a  smaller  collision  length.  However,  trans¬ 
formations  having  small  order  lengths  do  not  necessarily  possess  small  collision 
lengths.  Conventionally,  a  measure  of  the  efficiency  of  a  transformation  method 
is  the  uniformity  of  the  distribution  of  the  addresses.  If  a  file  is  accessed 
randomly  all  the  time,  this  measure,  based  on  uniformity,  is  satisfactory.  However, 
s.ometimes  sequential  retrieval  of  the  keys  may  be  required.  Since  mass  storage 
devices  are  usually  not  truly  random,  preserving  the  order  of  the  keys  becomes 
advantageous.  For  sequential  retrieval,  a  perfect  distributive  method  which 
preserves  order  as  well  as  giving  a  uniform  distribution  of  addresses  intuitively 
seems  desirable.  On  the  other  hand,  if  only  random  accessing  capability  is 
concerned,  the  parameter  collision  length  may  be  a  satisfactory  performance 
indicator  of  a  transformation  method.  Of  course,  due  to  the  arbitrary  distribu¬ 
tion  of  a  key  set,  a  closer  to  perfect  distributive  method  does  not  necessarily 
give  better  performance  for  a  particular  file.  It  can  only  be  expected  to  have 
good  results  on  the  average. 

Of  the  methods  studied  in  this  paper,  the  division  method  is  an  example  of  a 
perfect  distribution.  Lin's  method  is  much  closer  to  randomization.  Categoriza¬ 
tion  of  the  other  methods  is  not  so  obvious  because  their  characteristics  depend 
on  parameters  such  as  key  length,  address  length,  and  the  kind  of  operations 
used.  For  example,  the  mid-square  method  probably  has  high  order  and  collision 
lengths  except  where  keys  have  a  string  of  zeros  giving  all  zeros  in  the  address 
portion  of  the  product. 
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In  addition  to  the  categorization  of  the  transformation  methods,  one  should  also 
consider  the  classification  of  the  files  by  their  statistical  properties.  As 
mentioned  before,  it  is  quite  possible  that  a  particular  transformation  technique 
works  exceptionally  well  for  a  certain  kind  of  key  distribution  and  poorly  on 
others.  However,  before  making  an  investigation  in  this  direction,  we  should 
first  consider  what  statistical  characteristics  are  most  appropriate. 

One  approach  to  the  classification  problem  is  to  find  the  underlying  distributions 
from  which  the  keys  are  obtained.  However,  because  of  the  discrete  nature  of  the 
situation  and  the  possible  arbitrary  selection  in  selecting  a  key,  one  will 
frequently  have  difficulty  in  identifying  the  underlying  distributions .  A  much 
easier  and  still  meaningful  approach  is  to  find  the  distribution  of  the  cluster 
lengths  and  the  gap  lengths  between  clusters  in  a  key  set.  (A  cluster  here  is 
a  set  of  numerically  consecutive  keys  separated  at  both  ends  from  other  keys.) 
Perhaps  the  means  and  variances  of  the  cluster  lengths  and  gap  lengths  will  be 
sufficient  to  classify  a  set  of  keys  for  our  purpose.  In  addition,  the  density 
of  the  key  set,  or  the  ratio  of  the  number  of  keys  in  a  file  to  the  number  of  keys 
possible  in  the  key  space,  also  plays  an  important  role  and  should  be  taken  into 
account . 


Let  us  discuss  briefly  the  importance  of  these  parameters.  Let  m^,  m^,  v^  and 

Vg  be  the  means  and  variances  of  the  cluster  lengths  and  gap  lengtns  respectively 

with  the  subscripts  c  and  g  denoting  cluster  and  gap.  Let  d  be  the  density  o 

a  key  set.  If  m  .  m  ,  v  and  v  are  all  small,  the  keys  are  scattered  through- 
c  g  c  g 

out  the  range  rather  evenly  and  a  distributive  method  probably  will  have  little 
advantage  over  a  randomizing  method.  With  a  large  mc>  a  distributive  method 
is  expected  to  do  well  because  the  key  distribution  here  resembles  the  set  of 
keys  used  to  define  the  characteristics  of  a  transformation.  The  parameter  gap 
length  is  particularly  important  to  the  distributive  methods  where  a  wrong  choice 
of  parameters  may  result  in  many  records  being  mapped  to  the  same  address.  This 
can  happen  easily  in  the  division  method,  for  example,  if  v^  is  very  small, 
one  must  select  a  divisor  in  the  division  method  not  equal  to  m in  order  to 

avoid  an  excessive  number  of  records  from  going  to  the  same  address.  The  third 

parameter,  density  of  a  key  set,  has  great  influence  in  altering  the  performance 
characteristics  of  a  transformation  method.  For  instance,  if  d  is  large,  the 
folding  method  can  behave  almost  like  the  division  method. 
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The  discussion  of  the  categorization  of  transformation  methods  and  the  classifi¬ 
cation  of  key  set  distribution  suggests  a  further  experiment  to  test  the  various 
conjectures.  In  this  experiment,  the  transformation  methods  should  first  be 
characterized  with  the  use  of  the  parameters  of  order  length  and  collision  length. 
Then  we  proceed  to  obtain  the  parameters  m^,  m^,  v£,  and  d  from  the  key 
sets,  which  ideally  should  include  some  files  with  simple  underlying  distribu¬ 
tions  as  well  as  some  files  with  arbitrary  underlying  distribution.  Applying 
each  method  to  the  key  sets,  we  may  be  able  to  derive  from  the  results  the 
correlation  between  the  two  sets  of  parameters  with  respect  to  performance.  If 
this  can  be  done,  then  we  know  how  to  select  a  transformation  method  and  associated 
parameters  whenever  some  simple  statistics  of  a  key  set  are  available.  In  the 
absence  of  any  statistics,  we  can  always  simply  use  the  division  method  with  an 
arbitrary  divisor  as  suggested  earlier. 
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*! or  each  method  the  first  row  is  the  average  number  of  accesses  per  record  and  the  second  row  the 
standard  error.  For  each  bucket  size,  the  first  column  is  the  result  of  chaining  overflow  and  the 
second  the  result  of  open  addressing. 
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ABSTRACT:  In  this  report,  formal  definitions  of  the  concepts  of  secondary  index, 
key  index,  query  and  query  load  are  given.  This  is  done  for  the  case  of  a  single 
relation  (a  subset  of  the  cartesian  product  of  a  number  of  domains).  The  defini¬ 
tions  are  used  to  formulate  a  problem  in  secondary  indexes  and  show  how  the 
concept  of  query  load  is  related  to  the  concept  of  secondary  indexes.  An  evalu¬ 
ation  criterion  is  formulated  which  pinpoints  the  kind  of  input  data  that  is 
needed  to  evaluate  various  selections  of  secondary  indexes  to  match  the  query 
load  . 
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INTRODUCTION 


In  information  retrieval  systems,  data  is  stored  on  the  peripheral  storage 
devices  in  several  possible  ways.  One  method  is  to  divide  the  data  into  records, 
each  of  which  has  a  unique  identifier  called  the  primary  key  and  to  physically 
store  these  records  so  that  a  record  can  be  easily  retrieved  if  the  key  for  that 
record  is  given.  However,  if  a  request  is  made  to  retrieve  a  record  by  giving 
the  value  of  a  particular  attribute  other  than  the  key  attribute,  all  of  the  data 
has  to  be  retrieved  and  examined  in  order  to  respond  to  the  request .  In  order  to 
make  this  kind  of  retrieval  more  effective,  an  auxiliary  table  may  be  created 
which  either  directly  gives  the  addresses  of  those  records  that  have  specified 
values  for  the  given  attribute,  or  indirectly  to  give  the  list  of  keys  for  those 
records  having  the  given  attribute  value.  When  such  a  table  is  created,  we  say 
that  a  secondary  index  has  been  created  for  the  data. 

Clearly,  this  added  retrieval  capability  becomes  more  desirable  as  the  number  of 
requests  for  records  using  this  attribute  increases.  However,  the  price  to  be 
paid  for  this  capability  is  the  added  amount  of  space  required  for  the  storage 
of  the  table.  Requests  for  retrieval  of  records  are  called  queries.  IF  a 
number  of  different  attributes  arise  in  the  queries  which  refer  to  this  data, 
the  cost  of  storing  the  additional  secondary  indexes  to  accommodate  the  attributes 
increases.  The  problem  of  secondary  indexes  can  then  be  phrased  as  follows:  In 

view  of  all  the  queries  on  the  data,  what  set  of  secondary  indexes  should  be 
selected  to  facilitate  the  retrieval  and  keep  the  storage  costs  down. 

In  the  following  sections,  formal  definitions  are  given  for  the  concepts  of  key 
index,  secondary  index,  query  and  query  load  in  a  set -theoretical  framework. 

We  restrict  our  considerations  to  a  single  relational  set  R  which  is  defined 
as  a  subset  of  the  cartesian  product  of  a  number  of  domains.  This  formalism 
enables  us  to  define  the  concepts  of  key  index  and  secondary  index  in  terms  of 
partitions  of  R  into  subsets.  It  also  enables  us  to  give  unambiguously  the 
essential  features  of  a  query  which  include  the  subset  to  be  retrieved. 
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Once  these  notions  are  clearly  specified,  a  quantitative  measure  of  compatibility 
between  index  induced  partitions  of  R  and  the  pertinent  subset  of  data  specified 
by  a  query  is  introduced.  An  evaluation  criterion,  in  terms  of  these  compatibility 
measures  and  the  parameters  extracted  from  the  query  load,  can  then  be  established 
to  aid  in  the  selection  of  secondary  indexes. 

1.  SECONDARY  INDEXES  AND  PARTITIONS 


In  this  section,  we  consider  a  relation  R  as  a  subset  of  the  cartesian  product 
of  elementary  domains  A^,  i  =  0,  1,  ....  k. 

Thus  R  -  A  =  A  *  A  x  ...  *  A,. 

0  1  k 

For  each  i,  we  consider  the  projection  of  the  cartesian  product  A  onto  the 
factor  A..  This  projection  defines  a  functi  n  IT.  :  R  -*  A.  which  we  call  the 
projection  of  R  into  A^. 

In  the  case  where  It.  :  R  +  A.  is  1  :  1 ,  we  say  that  the  domain  A.  is  a  kev 
ii  i 

domain.  Thus  a  domain  is  a  key  domain  for  the  relation  R  if  the  projection 
satisfies  the  condition:  If  It.  (rj)  =  II.  (r^),  then  ^  -  r 

We  shall  assume  that  AQ  is  a  key  domain  for  the  relation  R.  We  let  RQ  c  aq 
be  the  image  of  R  under  the  projection  IT^ .  Thus  :  R  -*■  establishes  a 
one-to-one  correspondence  between  the  elements  of  R  and  the  elements  of 
which  are  called  the  keys  for  R. 


Since  11^  is  an  isomorphism  between  R  and  R^,  we  shall,  in  the  following 
discussion,  deal  with  R.  However,  the  entire  development  can  be  done  in  terms 
of  Rq. 

Iv'e  introduce  the  concept  of  a  partition  of  R.  We  say  that  a  collection^^^R) 
of  subsets  of  R  forms  a  partition  of  R  if: 


(1) 


For  Bj  and 
elements  ofJ^^(R) 


in  ..  n  B2  =  Q  whenever 

are  pairwise  disjoint. 


B1  " 


i . e . ,  the 
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(2)  R  is  the  union  of  the  elements  of^/(R) .  i.e.,  R  =  g^^<5 


B 

CR) 


We  now  show  that  partitions  of  R  arise  in  very  natural  ways.  Let  L  -  {0,1, ...,k} 
and  let  A  -  A.  *  A,  * _ *  A,  be  the  domain  of  R,  i.e.,  R  <-  A.  For  each 

0  1k  n 

subset  K  c  L,  we  form  the  cartesian  product  A  =  .  A.  and  let  H  :  R  ■+  A 

K  1  fcK  1  K  K 

be  the  projection  function  from  R  to  A,.. 


For  each  a  in  A^,  we  let  R(a,  K)  =  {r  c  R | II ^ (r )  =-.  a)  and  let 

=  {  R(a,  K)|a  £  .  It  is  easy  to  see  that^^fRj  is  a  partition  of 

We  call  this  the  partition  of  R  induced  by  the  projection  II  . 


R  . 


If  K  =  {k),  we  write  J^^(R)  instead  of ,_^^CR)  • 

are  two  partitions  of  R,  we  define  the  intersection  of  the 

partitions  denoted  to  be  tbat  Partition  of  R  consisting  of  all 

sets  of  the  form  A  n  B  where  A  and  B  . 

*1  K2 

It  is  easy  to  extend  this  concept  of  intersection  of  partitions  to  a  family  of 
partitions.  We  can  show  that  if  K  c  L,  then  the  partition  induced  on 

R  by  nK  is  the  intersection  of  the  partitionj^^  for  i  e  K. 


We  are  now  in  a  position  to  define  the  concept  of  secondary  index.  LetJ-^ 
be  the  partition  of  R  induced  by  the  projection  JI^.  :  R  -*■  A  .  The  function 

K  K 


AK  defined  by  <{>„(a)  =  R(a,  K)  is  called  an  index  of  R  with  respect 


to  Ak. 


Thus,  we  see  that  an  index  is  a  function  between  the  domain  and  the 


partition  induced  by  the  projection  II  . 


In  the  case  where  11^.  is  1  :  1,  the  partition  ,J^T(R)  is  in  1  :  1  correspondence 
with  the  elements  of  R  and  the  corresponding  index  if"  may  serve  as  a  key 
index . 
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2.  QUERIES  AND  THEIR  FORMAL  DEFINITIONS 


In  this  section,  we  .  ‘.all  formulate  definitions  for  a  query  and  show  how  they 
relate  to  the  concepts  of  index  and  partition  defined  in  section  1. 

First,  we  observe  that  given  a  relation  R,  a  query  is  a  request  for  the  values 
in  a  selected  set  of  domains  for  a  subset  of  R.  The  subset  of  R  is  specified 
by  giving  a  qualifying  expression. 

Thus,  for  example,  a  query  can  take  the  form:  What  is  the  value  in  domain  Ag 
for  all  elements  of  R  which  have  a  value  b  in  domain  A^?  Thus,  a  query  Q 
can  be  thought  of  as  having  two  parts,  a  specification  part  and  an  output  part. 
Formally,  we  write  Q  =  (Qg,  Qq).  Here  Qg  is  the  specification  part  of  the 
query  and  defines  a  subset  of  R.  Qq  is  the  output  part  of  the  query  and 
specifies  which  domains  are  required  to  be  displayed  for  the  subset  ■>£  R 
specified  by  Qg.  Thus,  Qq  must  specify  which  domains  are  to  be  displayed. 
This  can  be  effectively  achieved  in  the  case  where  the  relation  R  is  a  subset 
of  A,  the  cartesian  product  of  the  set  of  domains  {A^  j i  =  0,1,... ,k).  If  we 
let  L  =  {0,1,..., k),  then  Qq  can  be  specified  as  a  subset  of  L.  i.e., 

Qg  <=  L.  For  the  remainder  of  this  paper,  we  shall  be  concerned  with  the 
specification  part  of  the  query. 

The  function  of  the  specification  Qg  of  the  query  Q  is  to  determine  a 
particular  subset  of  R.  In  order  to  do  this,  observe  that  the  only  means  we 
have  to  use  in  the  specification  of  a  subset  of  R  are  through  the  use  of  the 
projections  of  R  to  the  elementary  domains.  Thus,  we  let  p(a,i)  stand  for 
the  expression  ILfr)  =  a.  The  subset  of  R  defined  by  p(a,i)  will  be 
denoted  by  R(a,i).  Thus,  we  have  that  R(a,i)  =  (rjlL(r)  =  a} .  We  shall  call 
an  expression  of  the  form  p(a,i)  a  primitive  expression  and  the  corresponding 
set  R(a,i)  a  basic  subset  of  R.  If  we  compare  the  definition  of  R(a,i) 
with  the  concept  of  partition  i_^?(R) ,  we  see  that  R(a,i)  is  an  element  of  the 
partition  .  In  fact,  it  is  the  image  of  the  index  $.  :  A^  +_^^R) 

evaluated  at  a,  i.e.,  R(a,i)  =  (a)  .  In  this  case,  we  see  that  the  query 

specification  defines  a  particular  element  of  a  partition.  We  are  now  in  a 
position  to  define  the  scope  of  Qg,  the  specification  part  of  the  query.  We 
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take  Qg  to  be  a  subset  of  R  defined  by  Qg  =  (r|P(r)}  where  P(r)  is  a 
well-formed  formula  of  the  first  order  predicate  calculus  which  has  no  quantifiers 
and  is  obtained  by  using  the  primitive  expressions  defined  above.  We  restrict 
ourselves  to  this  type  of  query  specification  in  order  to  avoid  the  myriad 
complications  which  arise  by  allowing  quantifiers. 

Now  it  is  well  known  that  every  formula  P(rj  can  be  transformed  into  an  equiv¬ 
alent  formula  which  is  in  what  is  called  the  disjunctive  normal  form.  We  say 
that  a  formula  is  in  the  disjunctive  normal  form  if  it  is  of  the  form 
Pi  v  v  ...  v  Pn  ,  where  each  P^  is  the  conjunction  formed  from  the  elementary 
expressions  and  their  negations. 

Thus,  P(r),  the  qualification  statement  has  the  form  Pj  (r)  v  P^Cr)  v  ...  v  Pnfr) 
where  each  P.(r)  is  the  conjunction  of  terms  as  above.  Since  each  Pi  (r ) 
defines  a  subset  of  R  by  the  formula  IT  =  {r|P^(r)j,  we  find  that  the  qualified 
subset  Qg  of  the  query  can  be  written  as  the  union  of  the  sets  B.  .  Ke  shall 
look  a  little  closer  at  the  expression  P^Cr). 

Let  P^tr)  be  the  conjunction  of  terms,  each  of  which  is  either  an  elementary 
expression  or  the  negation  of  an  elementary  expression.  We  observe  that  in  this 
conjunction  at  most  one  elementary  expression  for  each  domain  can  occur.  Thus, 
we  observe  that  the  elementary  domains  represented  in  the  formula  for  P^(r) 
form  a  subset  of  all  the  elementary  domains  and  can  be  characterized  by  a  subset 
K  of  L  =  {0,1, ...,k}. 

!ltus,  P.Crj^q. 

where  is  an  elementary  expression  of  one  of  the  forms: 

(1)  n.  (r)  =  a.  for  some  a.  e  A. 

1  1  J  J 

(21  II.  (r)  t  a  ,  for  some  a  .  e  A .  . 

J  J  J  J 
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set  K  can  be 

div; 

ded 

into  two  subsets 

K 

+ 

and 

j 

£  K 

+ 

if 

(r)  =  for 

a . 

) 

l  A. 
J 

j 

£  K 

if 

(r )  f  a^.  for 

a . 

J 

£  A. 
J 

We  then  define  for  each  P.  (r),  the  correspond! 

by 


ng  conjunction  P. ' (r)  defined 


v(r)  -  A+  ^ 


Clearly,  P.(r)  implies  P.’(r)  because  of  the  tautology 


p  a  q  ->  p. 


In  this  case,  take  p  =  , 


/\ 


j£K+  'j  - 


q  =  jeK  qj 


Then  P  A  q  =  j^K  q_j  -  ?i  (r) 
and  the  result  follows- 


Now  for  each  P.(r).  let  P.'(r)  be  conjunction  obtained  as  above  and  define 
Bi  =  I  ^  Cr ) )  and 


Bi'  =  tr|P.  '(r)}. 

Some  remarks  are  in  order  relative  to  the  set  B. ' . 


If  K+  =  8,  i.e.,  K_  =  K,  then  all  the  conjuncts  are  negations  of  elementary 


expressions , 

In  this 

case,  we 

take  B.  = 
i 

R. 

Thus,  for  each 

query 

Q  -  (Qs, 

Qq)  we  can 

assign  two  families  of  sets,  namely 

B. 

l 

i  *  1, . . . 

.  ,n  and  B.  1 
i 

i  ~  1 ,  . . ,  ,n 
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corresponding  to  the  specification  expression’s  conjunctive  normal  form 

P,  (r)  v  P  (r)  v  ...  v  p  (r)  where  each  P.  (r)  is  the  conjunction  of  elementary 
1  z  n  i 

terms.  Observe  that  B.  e  B.  '  because  P.  (r)  implies  P  ' (r) . 

it  1 v  '  r  i 

3.  QUERY -PARTITION  COMPATIBILITY 

In  this  section,  we  shall  introduce  a  qualitative  method  for  comparing  a  query 
with  a  partition.  First  we  define  the  notion  of  a  set  B  being  compatible  with 
a  partition.  Then  we  extend  this  concept  to  a  query  and  finally,  to  a  query  load. 

Let  B  be  a  subset  of  R  and  a  partition  of  R. 

Thus,  {R  | a  £  a}  where  the  R  are  pairwise  disjoint  and  their  union  is  R. 

We  say  that  B  is  equi  -compatible  with  if 

B  =  R  for  some  a  e  a  , 
a 

B  is  under -compatible  with 

B  =  R  for  some  a  e  a  , 
a 

and  B  is  over -compatible  with  if 

B  =  U,  R  for  some  6  <-  a. 
at6  a 

B  is  said  to  be  in-compatible  with  J-^if  none  of  the  above  three  conditions  hold. 
If  B  is  not  incompatible  with^^  then  B  is  said  to  be  compatible  with 

It  is  always  possible  to  find  a  subset  y  of  a  such  that  B  .  R^.  The 

above  definitions  of  compatibility  are  used  to  distinguish  the  special  cases 
where  the  subset  y  consist  of  a  single  element  of  a  or  where  the  inclusion 
is  actually  an  equality. 
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Lot  1 B. I i  =  1 . n}  be  the  family  of  sets  corresponding  to  the  query  specifica¬ 
tions  as  given  in  section  2. 

Uo  sav  that  the  query  Q  =  (Qg,  Qq J  is  compatible  with  the  partition  i f  each 
B.  is  compatible  with^^? 

Now  that  we  have  defined  the  compatibility  of  a  query  with  a  partition,  we 
introduce  the  notion  of  a  query  load  and  extend  the  concept  of  compatibility  to 
that  of  the  query  load  t>eing  compatible  with  the  given  partition. 

The  concept  of  query  load  is  obtained  from  the  intuitive  notion  of  all  the 
queries  which  are  formulated  for  a  period  of  time  for  the  relation  R.  Thus, 
suppose  that  over  a  period  of  time,  the  set  of  queries  {Q i | i  =  1,,..,N} 
are  observed,  each  occurring  with  a  frequency  h. ,  i  =  1,,.,.N.  Then  this  gives 
some  indication  of  the  requirements  of  the  system  with  respect  to  questions  asked 
of  it  for  the  relation  R. 

Thus,  we  shall  define  a  query  load  ^fto  be  the  set  of  ordered  pairs 


{(Qp  h.)|  i  =  1,...,N},  where  is  a  query  and  hj  is 


an  integer  representing  the  frequency  of  occurrence  of  Q.  . 


>'«(.•  could  extend  the  notion  of  the  query  load 


2S? 


to  be  compatible  with  the 


partition  J^in  the  obvious  way,  i.e.,  by  requiring  that  each  Q.  be  compatible 

/■ts}  J 


with 


However,  this  approach  leads  to  a  classification  of  a  query  load  and  a  partition 
being  compatible  which  ignores  the  frequency  of  a  query.  Thus,  for  example,  a 
partition  may  be  compatible  with  all  but  a  single  query  in  the  query  load  and 
be  classified  as  being  incompatible.  Thus,  we  seek  to  introduce  a  concept  of 
degree  of  compatibility. 


■Vs  a  first  approximation,  we  assign  some  figures  to  measure  the  degree  of 
compatibility  between  a  set  B  and  a  partition 
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Thus,  wc  assign  values  Cj,  c^,  c 3  and  c4  to  the  set  B  depending  on  its 
compatibility  with  the  partition  The  assignments  made  to  B  are  given 

in  Table  1 . 


Value 


e. 


Type  of  Compatibility 
Equi-compatible 
Over-compati blc 
Under-compat ible 
In-compatible 

TABLE  I 


The  values  of  the  c.'s  have  the  further  constraint  of  being  ordered.  Thus,  we 
postulate  that  they  satisfy  the  inequal  it  iec- 


C1  C2 


'4  • 


Intuitively,  these  inequalities  reflect  the  notion  that,  for  example,  it  is 
easier  to  retrieve  B  if  it  is  equi-compatible  with  than  if  it  is  over- 
compatible  with 


Thus,  to  each  query  we  assign  a  value  which  is  obtained  as  the  sum  of  the  \  a  lues 

for  the  individual  sets  associated  with  the  query.  Some  care  should  he 

exercised  here  in  the  selection  of  th>*  c.,  but  we  are  interested  in  a  first 

1 

approximation  for  the  measure  of  compatibi litv  between  a  querv  Q  and  a  partition 
&  Thus,  we  can  attach  a  measure  of  compatibility  between  a  query  Q  and  a 
partition  ,j94>f  Ft  which  we  denote  by  c  ( U ,  .94. 

if  iK  { (Q. ,  h. J | i  =  1 . N}  is  a  query  load,  then  we  can  define  a  measure  of 

compabibi 1 itv  between  ^J^^and  denoted  by  cj^^^lby  the  formula 


N 

L 

i  =  1 


c(Qj  0*^ 
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By  this  procedure,  we  can  compare  the  various  partitions  with  respect  to  the 
query  load  Jx  and  tlius,  obtain  a  first  order  qualitative  evaluation  of  the 
partitions  of  11  with  respect  to  a  given  query  load. 

This  measure  of  compatibility  between  the  query  load  and  the  partition  ,3^ 
can  then  be  used  to  evaluate  the  effectiveness  of  choosing  an  index.  This  follows 
easily  from  the  definition  of  index  given  in  section  2.  Thus,  if  I.  is  the  set 
of  domains  which  are  to  be  used  as  secondary  indexes,  we  consider  the  partition 
corresponding  to  this  set  1  and  evaluate  the  function  c£^\j2lf)  to 
give  a  figure  of  merit  for  the  secondary  indexes  I,  as  related  to  the  query 
load 


This  framework  can  be  extended  in  several  directions  and  the  abc.e  discussion 

only  serves  as  a  first  approximation  to  the  selection  of  a  set  of  secondary 

indexes  which  arc  responsive  to  a  query  load.  The  principal  problem  which  the 

designer  now  must  consider  is  the  appropriate  selection  of  the  values  cJ(  c,, 

c  and  c  .  Those  values  should  reflect  the  actual  physical  site  of  the  data 
o  4 

stored,  and  the  access  times  for  the  retrieval  of  the  atomic  particles  of  the 
partition  being  considered. 
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SOME  RESULTS  ON  STORAGE  SPACE  REQUIREMENT  AND  RETRIEVAL 
TIME  IN  FORMATTED  FILE  ORGANIZATION  * 


by 

S.  P.  Ghosh 

Information  Sciences  Department 
IBM  Researcli  Laboratory 
San  Jose,  California 


ABSTRACT:  Some  basic  mathematical  concepts  underlying  the  interaction  between 
queries  and  records  of  a  file  have  been  discussed.  Formulas,  for  the  storage- 
space  need  in  fiie  organization  when  chaining  techniques  are  used,  have  been 
derived  for  Simple  formatted  files  and  Hierarchical  files.  Some  simplifications 
of  formulas  in  special  cases  nave  also  been  discussed.  The  results  of  the 
chaining  technique  have  been  compared  with  tnat  of  the  Inverted  File.  The 
average  retrieval  time  need  to  retrieve  records  when  chaining  techniques  are 
used  have  been  calculated  and  compared  with  that  of  Inverted  File  organization. 
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1.  INTRODUCTION 


In  electronic  data  processing,  information  is  recorded  as  sequences  of  binary 
bits.  A  collection  of  infoimation  describing  an  event,  a  physical  object,  or 
any  other  type  of  entity  is  usually  called  a  record.  In  general,  a  record  is  a 
collection  of  small  information  units  which  represent  values  for  the  attribute 
or  properties  of  the  entity.  This  record  may  be  represented  as  (v  ,v,  ,.  ••»VI||) 
where  the  v^'s  are  information  units.  Normally  each  entity  is  uniquely-  identi¬ 
fied  by  the  value  of  some  attribute  or  combination  of  attributes  to  allow  ' * 
to  be  discussed  and  maintained  unambiguously.  Sometimes  an  identification 
information  unit  is  added  to  each  record  or  one  or  more  information  units  in 
the  record  may  be  used  as  an  identifier.  The  identifier  is  usually  called  the 
key  of  the  record.  Thus,  if  there  arc  N  records,  they  may  be  represented  as: 


J 


where  k  is  the  key 

i 

contained  in  the  ith 


and  the  v.  ,’s  arc  the  m. 

U  i 

record . 


different  information  units 


A  collection  of  records  is  called  a  file.  If  the  information  have  some  format 
structure  imposed  on  them  then  the  records  are  called  formatted  records;  and 
the  file  is  called  a  formatted  file.  In  most  data  processing  problems,  the 
files  are  formatted.  In  this  paper,  we  will  consider  only  two  types  of 
structured  files;  namely,  Simple  Formatted  files  and  Hierarchical  files. 


With  regard  to  the  contents  of  formatted  records,  the  value  or  information 

unit  relating  to  the  same  attribute  is  stored  in  a  fixed  position  with  respect 

to  other  information  units  in  the  record.  This  relatively  fixed  position  of 

a  record,  and  hence  of  the  file,  is  often  referred  to  as  a  field.  Thus  all 

the  v..'s  for  a  fixed  j  and  i  =  1,  2,  ....  N  mav  be  referred  to  as  values 
l  j 

of  the  i*'1  field  or  jtn  attribute.  When  every  field  represents  a  pnystcall> 
distinct  attribute  and  every  record  contains  one  value  of  each  attribute,  then 
the  file  is  referred  to  as  a  Simple  formatted  file.  Tints  m  simple  lormattcd 
file,  all  the  m.'s  are  equal. 
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In  a  large  file,  the  values  of  an  attribute  will  not  always  oe  distinct.  There 
are  some  attributes  for  which  every  record  will  have  a  distinct  value;  but  in 
most  of  the  practical  situations,  this  will  not  be  true.  Thus  the  frequency 
distribution  of  values  of  one  or  more  attributes  will  be  a  useful  statistic  in 
many  problems  relating  to  information  storage  and  retrieval. 

Tne  ability  to  retrieve  segments  of  information,  when  desired,  is  one  of  the 
most  important  aspects  of  storing  information  in  a  computer  system.  The  process 
specifying  the  subset  of  the  file  to  be  retrieved  is  called  querying.  A  query 
essentially  specifies  a  subset  of  the  file  by  a  series  cf  conditional  statements, 
anu  tiie  retrieval  process  consists  of  retrieving  all  records  which  satisfy  those 
statements.  The  information  specified  by  a  query  may  relate  to  tiie  key  field  or 
to  values  of  other  field,  e.g.,  Retrieve  all  records  whose  keys  lie  between  K' 
and  K”  or  retrieve  all  records  in  which  the  the  j**1  attribute  has  the  values 
Vj,  or  v,  or  v,  etc. 

The  query  structures  imposes  a  frequency  distribution  on  a  file  which  may  be 
explained  as  follows.  Consider  a  simple  formatted  file  with  attributes 
A I ,  A . .  A  ^ .  Tiie  attribute  can  take  ik  values,  j*l,  2,..., in.  Thus 

tiie  total  number  of  different  types  of  records  possible  is  n  n.=n.  Though 

j  =  l  J 

n  types  of  records  are  possible,  the  file  may  contain  fewer  records.  If  tiie 
file  contains  more  tiian  n  records,  then  there  are  some  records  whicii  are 
identical  witii  respect  to  the  m  attributes  A,,  A-,..., A  ;  but  there  arc 
otiier  information  units  in  the  record  which  makes  them  distinct.  Let 


f C A  j  =  Vj ,  A,  =  v ^ , 


A  =  v  } 
m  m 


(1.1) 


denote  the  frequency  of  the  number  of  records  for  which  the  attribute  Aj 
takes  the  value  v,,  A  takes  tiie  value  v ......  A  takes  tiie  value  v  .  The 

frequency  functions  (1.1)  characterize  the  frequency  distribution  of  the  records 
in  tiie  file.  If  the  queries  specify  values  of  attributes,  then  the  frequency 
distribution  of  queries  can  be  cerived  from  (1.1),  e.g.,  suppose  a  query  specifics 
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then  the  frequency  of  this  query  is  given  by 


Aj = v i *  A2=v2* ' ' ,Ai=vi ’ 


Ef(A ,=v  ,  A0=v0, . . .  ,A  =  v  J 
11  2  2  mm 


where  S.  is  the  sum  over  all  values  of  the  attributes  A.  , ,  A.  A 

l  a+1  i+2  n 


One  of  the  main  purposes  of  organizing  records  in  a  file  is  to  reduce  the  time 
needed  to  retrieve  the  records  pertinent  to  queries.  The  problem  becomes  more 
complex  because  of  some  uncontrollable  factors  like  addition  of  new  records, 
addition  of  new  queries,  etc.,  which  have  to  be  taken  into  account.  In  a  dynamic 
environment  as  the  size  of  the  file  grows,  the  cost  of  reorganizing  the  records 
becomes  large.  In  most  cases,  the  same  record  is  pertinent  to  more  than  one 
query,  hence  the  problem  of  additional  storage  space  becomes  another  restriction 
on  organization.  The  two  important  factors,  storage-space  and  retrieval  time, 
act  in  opposite  directions  in  file  organization.  Thus,  trying  to  reduce  one 
of  these  factors  leads  to  increase  in  the  other.  The  two  extreme  situations 
are  reflected  in  the  Query  Inverted  File  Organization  and  Natural  Storage 
Organi zation . 


An  appropriate  meaning  of  the  term.  Query  Inverted  File,  is  "to  reorganize  the 
records  in  such  a  manner  that  certain  types  of  information  units  can  be  re¬ 
garded  as  identification  units  of  the  records."  Thus,  a  simple  formatted  file 
can  be  inverted  with  respect  to  the  values  of  one  attribute  or  combination  of 
values  of  one  or  more  attributes  or  with  respect  to  a  set  of  queries.  Suppose 

there  are  k  queries  represented  by  q^,  q,,...,q^.  Then  an  inverted  File 

with  respect  to  these  queries  is  constructed  in  the  following  manner.  All 
records  which  satisfy  the  query  q.(i=l,  2,...,k)  arc  stored  in  adjacent  lo¬ 
cations,  with  q^  as  the  identification  label  for  all  these  records.  It  is 

obvious  that  if  a  record  qualifies  for  more  than  one  query,  then  it  will  have 
to  be  stored  more  than  once.  Hence  redundant  storage-space  is  needed.  As 
records  pertaining  to  any  one  query  are  adjacently  stored,  retrieval  time  i  •- 
minimum.  Thus  in  the  Query  Inverted  File  storage-space  is  sacrificed  for 
retrieval  time. 
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In  the  Natural  Storage  Organisation,  the  records  are  stored  in  the  same  order  as 
they  are  added  to  the  file,  i.e.,  there  is  no  special  type  of  organization.  Thus 
the  storage-space  needed  is  minimum.  When  records  pertaining  to  a  query  are  to 
be  retrieved,  the  query  is  matched  with  every  record  in  the  file  to  determine 
the  pertinent  records.  Thus  the  retrieval  time  is  maximum. 

Other  file  organization  methods  try  to  balance  between  storage-space  and  retrieval 
time  by  using  other  techniques.  One  of  these  methods,  usually  referred  to  as  the 
chaining  technique,  will  be  discussed  in  details  in  this  paper. 

The  main  concept  underlying  the  chaining  technique  is  to  link  records  pertaining 
to  a  query  by  link  fields.  The  link  field  contains  the  location  of  the  next 
record  pertaining  to  that  query.  Thus  the  necessity  of  scanning  all  records  to 
find  pertinent  records  can  be  eliminated.  When  a  record  qualifies  for  more  than 
one  query,  redundant  storage  of  records  can  also  be  avoided.  Storage-space  for 
the  chaining  technique  is  larger  than  Natural  Storage  Organization  because  of 
the  link  fields;  and  the  retrieval  time  is  more  them  Inverted  File  because  the 
records  are  not  stored  in  spacial  proximity.  In  some  situations  the  chaining 
technique  can  be  very  complex  because  it  depends  not  only  on  the  pertinent 
records  but  also  on  the  physical  locations  of  the  records  in  the  storage  system. 
There  are  many  methods  for  reducing  complexity  of  the  chaining  technique,  but 
this  can  be  achieved  only  by  increasing  the  retrieval  time  or  storage-space 
or  a  combination  of  both.  One  such  method  is  grouping  records  into  buckets  and 
then  confining  chaining  to  within  buckets.  Details  of  such  methods  will  be 
discussed  in  the  latter  part  of  the  paper. 

The  concepts  discussed  in  this  section  are  fundamental  concepts  of  file  organiza¬ 
tions.  These  have  been  introduced  in  the  field  by  many  researchers,  and  the 
original  inventors  cannot  be  identified.  Hence  no  attempt  has  been  made  to 
associate  literatures  in  the  preceding  discussions.  However,  some  references 
in  which  these  concepts  may  be  traced  are:  Gray  et  al  (1961),  Buchholz  (1963), 
Baker  (1963),  Davis  and  Lin  (1965),  IBM  Report  (1967),  Abraham,  Ghosh,  and 
Ray-Chaudhuri  (1968),  etc. 
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2.  RELEVANT  MEASURE-SPACE  FOR  STORAGE 


In  this  section  we  shall  introduce  the  structure  of  the  space  of  events  and  the 
measure  functions  that  have  to  be  defined  for  calculating  storage  space  when 
the  chaining  technique  of  organization  is  used.  Let  the  set  of  queries  be 
denoted  by  q^,  q2>...,qk.  Each  record  can  be  classified  into  two  classes  with 
respect  to  a  query;  namely,  whether  the  record  satisfies  the  query  or  not. 
Without  any  loss  of  generality,  the  symbol  q^  can  be  used  to  denote  the  event 
that  a  record  satisfies  the  query  q^  and  q^  to  denote  the  event  that  a  record 
does  not  satisfy  the  query  q^.  Thus  the  binary  event  space  of  the  query  q^  can 
be  denoted  by  -  {q^  q^}.  Consider  the  k-dimensional  product  space  denoted 
by 


s  «  Ql  x  Q2  x  . . .x  Qk. 


can 


An  element  of  this  product  space  is  denoted  by  where  nf 

take  two  values  q^  and  q^.  The  element  P  represents  the  classification  of  a 
record  with  respect  to  the  set  of  k  queries,  eg.,  if  0*  (q^  q2» 
it  means  that  the  particular  record  corresponding  to  satisfies  the  query  q1 
but  does  not  satisfy  the  queries  q2  and  q^  but  satisfies  q^.  In  order  to  define 
any  measure  over  the  product  space  S,  sigma-fields  (o* fields)  have  to  be  intro¬ 
duced.  As  each  is  a  binary  field,  the  a-field  over  Qi  contains 
CTj={q^,  q^,  fi^},  where  is  the  event  that  a  record  has  no  classification 

with  respect  to  the  query  q^  and  is  the  event  that  the  record  may  or  mav 

not  satisfy  the  query  q^  Thus  the  a-field  over  S  (a(S})  is  the  product  space 
of  the  a-fields  over  Q^'s,  i.e.,  a  { S }  *  (a^,  ,  •  •  •  ,a,() .  For  every  aec{S}, 

f(a)  will  denote  the  number  of  records  in  the  file  for  which  the  compound  event 
a  is  satisfied. 


Example  2.1 

Suppose  there  are  4  queries  denoted  by  q^,  q^,  q^,  and  q^.  Then  S  contains 
24  points  of  the  form  9«(P  ,  P  P^,  P^)  where  ^-q^  or  5^,  i*l,  2,  3,  4. 
c  {S}  contains  44  noints  of  the  following  type: 

o  -  (nx>  o2,  o3,  a4) 

where  a  can  take  any  of  the  four  values  q^,  Oj  ,  ,  and  for  i=l,  2,  3,  4. 
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Thus  .in  event  of  the  type  (q^,  q7,  y^,  represents  a  record  which  satisfies 

the  query  q^,  does  not  satisfy  the  ouerv  q^,  has  no  classification  with  respect 
to  the  query  q^,  and  may  or  may  not  satisfy  the  query  q^.  f(a)  indicates  the 
number  of  records  in  the  file  which  have  the  above  specification. 

Simple  Formatted  Files 

In  the  simple  formatted  file,  any  record  can  be  classified  into  three  states 
with  respect  to  any  query;  namely,  q^ ,  o^,  and  -k .  Thus  for  a  simple  formatted 
file,  0^  does  not  appear  in  any  co-ordinate  position  of  i{S},  hence,  will  not 
be  discussed  in  the  present  context. 


In  this  representation,  the  frequency  of  records  pertinent  to  a  query  q.^  is 
denoted  as 

f(  1’  2 . 1-1’  V  i+l”,*’V*  (2*1> 

Thus  the  frequency  of  the  records  which  are  pertinent  to  the  queries 

q.  ,  q.  , . . . , q  .  is  denoted  bv 
1  2  1i 

f(  -  i  » •  •  •  Q  .  •  •  •  » P  .  >•••-«)•  (2*2) 

1  il  l2  i  k 


ror  calculating  the  storage-space  needed  for  chaining  technique,  it  will  be 
assumed  that  the  records  are  of  equal  length.  The  length  of  a  record  will  be 
denoted  by  r.  In  chaining  technique,  duplicate  storage  of  records  is  avoided 
by  using  link  fields.  Hence,  if  the  querv  set  is  given  by  q  ,  o0,...,q,  ,  then 
the  frequency  of  records  in  the  file  organization  is  given  by 


f  (B)  = 


f[i£] 


2’  ’ '  *  ’  ”1-1 ’ 


V 


1+1’ 


(1.3) 


The  symbol  f(B)  is  used  to  represent  (2.3)  because  in  many  file  organizations 
the  chaining  is  done  only  within  a  bucket,  hence  if  q^,  are  the  queries 

pertinent  to  a  bucket,  then  (2.3)  gives  the  frequency  of  a  bucket.  (2.3)  can 

be  expressed  in  terms  of  (2.1)  and  (2.2)  in  the  following  manner: 

If  exclusive  union  is  denoted  by  +  and  difference  by  -,  then 
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(q^,  u  (Pj,  q2# 


=  (q  j »  -  <jt  •  •  • » ^ )  "*■  [  (-^j »  q2>  -'2  *  *  ’  *  » -"3^ 

—  (q j »  ~2 . . ^  (•"•j »  q  2  *  ■  ]’■  ■  ■  »^V ^  ^ 

=  (q,,  f^,...,^)  *  Cqj,  i>2 , .  .  .  ,9^)  n  (Q^t  q^t  (2-,...,^) 


Thus , 


^  [  (q  2  *  *2 * *  •  • » ^  >  q  2  ♦  -‘3 » •  •  •  »  J 


f  Cq  L ,  '-2,...,-Qk)  +  f  [  (q  2  j  :<2, . .  .  ,^k)  n  (J^,  q2>  I^,...,^)] 


*  f  (q  j  k)  +  f(  2  >  q2. 


f(qr  q2*  3  ’  ‘  ’  ’  k} 


(2.4) 


Similarly,  ^1’  u  ("j »  q2»  ”3*  *  •  •  u  ((-j*  *^2»  q2*  -  •  •  •  »--k) 

“  (q 2  *  2 1  • ' '  "^"  ( q  2  *  -  2 » •  •  •  t-^)  ^  (--j  >  q  2  *  ^ 


^qi*  •2»,,*»''Ir^  n  (“1  »  q?> 


r  2 ’  "3’ 


>. )  n  (O  ,  f\  ,  a . ,  3. ,  .  .  .  ) 

“  1  2  '  3  4  k 


Thus , 


f Tq j »  ■))•••)  k )  1 '  (  2’  q  ■?  *  3 » •  •  •  ^  u  ( ” 2 *  2 »  q3.  ^  *  •  •  • »  ^ )  1 


f(q1,-'!2,...>£2k)  +  f  (-j  t  q2,  "3 . \)  +  f(°2»  O3,  4>...,'k) 


f(qx.  ^2*  “  *3  *•••»*  3^ )  —  f  (q  2 »  ‘:2*  q  3  >  (^ » • •  •  >  ) 


f (" 2 *  q2 »  q 3»  ^ » •  •  • »  "k^  ^  ( q  2  *  q  2  *  ^  3  > 


(2.5) 


Similarly  by  induction  it  can  be  shown: 


t(B)  -  fl.s,<  1...,ri_l,  qtl . :’„)! 


n  1-9 


‘E  f(iY-“*  !!i-r  V  :i+i’---’ri< 


i=i 


k  k 


EE  t(L:r--’  L'i  -i*  qi ■•••■ 


H  *  *2 


1  i 


V 


k  k  k 


EEE  “v-v-i 

*!  *-*2*  *3 


, . .  .q  , . .  .  ,  n,  ) 

2  3 


+  (-l)k+1  f(q 


1’  °'2  ’ ' 


V 


(2.6) 


1 f  12.0)  is  multiplied  by  the  length  of  a  record,  i.e.,  r,  then  the  space 
occupied  by  the  record?  excluding  the  space  occupied  by  link  fields  can  be 
obtained.  If  the  length  of  tiie  link  field  is  assumed  to  be  constant  and 
equal  to  then  the  space  occupied  by  the  link  fields  is  given  by 


[f<r. 


i* *  * ' 


Vr  v 


l+r 


v  -i] 


i=l 


(2.7) 


It  should  be  noted  that  though  no  link  fields  are  needed  for  the  last  record 
pertaining  to  a  query,  yet  a  termination  field  is  needed  to  signal  the  end 
of  the  search.  If  the  length  of  the  termination  field  is  denoted  by  Jl^,  then 
the  space  needed  by  the  termination  field  is  given  by: 

Y  (2.8) 


Hence  from  (2.6),  (2.7),  and  (2.8)  the  storage-space  needed  for  the  chaining 
technique  is  obtained  as: 


k 

S(C)  =  rf  (B)  +y\f(fl<t..., 
i-1 


i-l  ’  4i ’  i+1 ’ " 


D  ^ 

V 


+  k  (2-2) 


(2.9) 
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The  formula  (2.6)  becomes  s'mple  in  some  special  cases,  which  arv  eivon  below: 


Special  Case  I :  Frequency  distribution  of  queries  is  uniform  and  the  joint 
frequencies  are  products  of  individual  rMuencie^ . 

Such  situat.nns  mav  occur  sometimes.  f'ne  e.'.amjc  of  such  a  situation  oil!  he 
when  the  distribution  of  the  records  with  respect  to  all  patterns  of  values  of 
attributes  are  uniform  and  the  query  set  consists  of  queries  which  specify 
values  of  an  equal  number  of  disjoint  attributes.  For  such  a  situation, 


hence 


f';Y  "2 ’  ‘  ‘  *  Vr  V  Yu . V  •  %!/k  fnr  a:i  1 

2 

f  (P 1  , . .  .  ,  q  ,  , . .  .  f  q  .  »  \/k  for  all  i ,  and  i‘  -arid  «; o 

l  ij  J2  '  J 

<m  ■  mi-  t%l> 


2 !  ? 


3! 


=  N[l-  (k-1)  "  — L3<’' 31  >  +  •  -  .-hf-l  )’’  +  1  (2.10) 

3k  “  4k 


Hence 


S(C)  -  Mill-  £  (k-1)  --L  T-‘h... -Vr  1 

3k  4k  k 


4k‘ 


+  N?  +  k  ( P  j  -  f ) 


(2.11) 


Special  Case  II:  Invariance  of  proportion  of  nertinent  records  with  resnect 
to  queries  in  any  subset  of  the  file. 

In  such  si tuations,  the  frequency  distribution  of  the  queries  can  be  stated  as 


and 


f(pi . Yr  V  Yi . V  *  Y  where  0<VrI 

. .  q,  •••-.  q.  ~  ) 

t  x2  K 

V 

=  N  n  ffO . P  n  °  .  .  P  \ 

1-1  V  1 . l.-l*  "l  .  i.+l*  ’  V 

J  ’ll 


=  v  17  p 

.1  =  1  N 


k  k 


k  k  k 


Thus  f(B)  = 


[EvSEv/SSL, 

L  i=l  j  J  i  1  3  .  ,  .  ,  .  1 

1  r  2  ij  *  12^1j 

lr'2*”*Pk  ]  • 


Pi  P,  + 

2  3 


+  <-l)k+1  P 
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k  k 


Hen  ci-' 


i>{C)  -  rN 


■[EvSE%v  TZZw’l 

i=l  ti^ij  "  i.  4i„  i  i„ 


1  iX2  **  X3 


(-1) 


k+1 

0,P2-  -  .  ,  Dj 


+  ,;  Pj  +  k(P,-?) 
i»l 


<2.12) 


Spec  ' .)  1  Case  ITT  :  Pis  i  oln_t_  queries  w  i  th  respect  t  r  ecords  . 

An  example  of  such  t  nitration  is  iBM’s  ISAM  file  organization.  The  records 
bo  lor.?  i  nit  to  the  ouerv  mav  be  denoted  bv  «(o4  ).  Thus  disjoint  merles  implies 

n  I  .  n_  a .  1=6 


(A  v)  = 


where  6  is  the  enptv  set. 
Tin’s 


Hence 


So 


f(f)  ,...n  ,...n  \)  =  for  f  »-2 , 3, .  .  ,k. 

1  x2  V  k 
,(s>  '£}’  . . Vr  V  Vi . V 

k 

S(C)  «  <r+' )  t .  i_l,  q..  k 


)  +  kC-^-O)  (2.13) 


Special  Case  IV:  Nested  Queries. 

A  set  of  queries  q,,  q^.  are  called  nested  if  there  exists  a  querv  q^ 

among  them  for  which  f>(qj)  D  0(q.)  for  all  i. 

Such  situations  are  found  in  manv  practical  situations;  and  in  many  file  organi¬ 
zations  with  bucket  arrangements,  this  property  of  queries  may  be  used  for 
reducing  storaee  space.  In  such  a  situation 


f(B)  =  f( 


r 


r>  a  o  O  ) 

u-i’  qj’  ■  j+i .  V 


Thus 


S(C)  =  rf (  V’  QJ-1’  V  'Vl . V 

. Vi-  v  Vi . 

i=l 

+  k.t^-D 


(2.14) 
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Special  Case  V:  Disiolnt  nested  queries. 

In  this  situation,  the  query  set  is  such  that  tt  can  he  divide,’  into  L  trnuos 
(any  size)  which  a  re  denoted  by  (V  ,  0^,...,  0^  with  the  following  nropert i cs : 

(i)  If  q.  r  0,’  and  q.  r  o'  tiien  n(a.)  n  f(q.)  =  &■  for  eve rv  clement  of 
i  i  i  i  'i  i 

0!  and  0 \  when  l^i. 
i  t 

(ii)  In  every  group  there  exists  a  q^  such  that 

i 

P(<1,  )  ^  f(a,  )  for  everv  q.  C.  0\  . 

J .  'i .  i  .  i 

J  1  1 


In  such  a  situation 

f(r;l  •  •  ’  qJ 


Hence 


and 


»•••«  <1,  . q,  ‘k >  *  0  *or  - -  h- 

l  K 


f<B)  j.-r  V*  ^+1* •  •  •  •  V 

k 

+1*’"’  V 


J  =  r^P  V’  :'j. 

iTi  tii 

+£cr(r’”"’  '  i~r  v  r1+i . 


:•) 


;(Y;) 


(2.  15) 


It  would  he  interesting  to  compare  the  storage  soace  for  Inverted  File  Organiza¬ 
tion  with  that  of  the  chaininc  technioue.  ror  the  Inverted  r;]e  Organization, 
the  storage  space  is 


V 

SCI)  =  rVf('  .  r.  ,  q . , 

Z  -  a  1  1-1  1  1  +  1 

i=  ] 


) 


(2.1b) 


For  comparing  (2.16)  with  (2.6)  we  will  assume  that 


because  in  most 


practical  situations,  the  difference  will  be  very  small.  Thus 


S{I)  -  S (C)  =  r 


r-  V 


1  •  •  •  ’  i  - 1  ’  q  i  •  •  ■  ■ c  i 

*1  1  •  l  *; 
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k  k  k 

■III  ", 

i  ^  i  i t  i 
+  (-1)''+1  f  (  a  ^ ,  q.,,.. 


In  general,  it  is  difficult  to  say  whether  (2,1.7)  is  positive  or  negative;  but 
in  special  cases,  some  conclusions  can  be  arrived  at.  If  the  Inverted  Tile  is 
constructed  on  addresses  of  records  and  the  chaining  technique  is  also  performed 
on  addresses,  then  the  length  of  the  address  field  may  he  almost  emial  to  or 
less  than  that  of  a  link  field.  Wien  7.  2:  v,  (?.17>  is  negative,  hence  the 
storage  space  is  less  for  the  Inverted  File  than  the  chaining  technique.  In 
many  situations,  the  Inverted  File  and  chaining  technique  are  applied  to  the 
records  directly.  In  such  cases  r  ■»  ? ,  if  r  is  large  in  comparison  to  9 , 
the  storage  space  for  the  Inverted  File  will  be  greater  than  that  for  the 
chaining  technique.  The  actual  switching  point  will  depend  on  the  frequency 
distribution  of  the  queries. 

3.  H1KKARCHICAL  FILLS 

In  nic  rarcnical  files ,  a  record  contains  two  distinct  types  of  segments.  One 

part  is  called  Header,  or  Master,  or  Level  0,  or  primarv  segment;  and  the  other  part 

is  composed  of  a  numner  of  smaller  segments  which  are  referred  to  as  Repeated 
segments  or  subordinate  segments  or  Level  1,  2,  etc.  The  primarv  segment  has 
a  simple  formatted  structure  with  pointers  or  links  pointing  to  the  subordinate 
segments.  For  simplicity,  it  will  be  assumed  that  there  is  onlv  one  subordinate 
level  which  may  contain  repeated  information-,,  e,g.,  the  primary  segment  may 
contain  information  relating  to  the  head  of  a  family  and  the  subordinate  level 
may  contain  formatted  i  formation  about  the  members  of  the  family. 

In  hierarciucal  files,  a  query  may  relate  to  primary  information,  or  subordinate 

information,  or  a  combination  of  both.  Thus,  in  chaining  techniques,  it  may 
be  necessary  sometimes  to  link  a  subset  of  subordinate  information  of  one 
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record  with  that  of  another  record.  Hence  each  unit  of  subordinate  infnrri.it  ion 
must  have  its  own  identification  label  or  key.  It  is  needless  to  say,  that  ••acii 
primary  segment  has  an  identification  label.  The  identification  of  a  unit  of 
subordinate  information  usuallv  contains  the  identification  of  the  iric.irv 
segment  and  sonc  additional  hits  tor  its  own  identification.  The  interactirn 
of  a  query  with  a  record  in  a  hierarchical  file  can  be  complex.  Sometimes  onlv 
the  primary  segment  may  be  pertinent  to  a  query,  and  other  times  onlv  some-  units 
of  the  subordinate  segment  may  be  relevant.  Thus  for  the  chaining  technique, 
link  fields  have  to  be  attached  to  each  unit  of  the  segments.  The  lcnct:  of  the 
primary  segment  and  the  units  of  the  subordinate  units  may  he  different,  hence 
for  calculating  storage-space  it  is  necessary  to  have  the  frequency  distribution 
of  both  the  primary  segments  and  the  units  of  the  subordinate  segments  wit1' 
respect  to  the  queries. 


The  event  space  and  the  -7-field  associated  with  it  is  the  same  as  in  simnU— 
formatted  file.  The  difference  is  that  now  a  bivariate  frequency  function  i  •- 
associated  with  each  element  of  the  -7-field;  one  for  the  priraarv  segments  and 
one  for  the  units  of  the  subordinate  segments.  Thus  the  two  frequence  funets«n^ 


associated  with  the  event  (f 


r  "  2*  * ' 


‘i-l’  i’  i-M .  !c)  are: 


0l'l’ 


\-l*  V  V  =  f0i  (snv) 


0.1) 


and 


f  n 

r  1’  2 .  i-l’ 


V  Vi’’--*  V  =  fu  (s‘1v) 

i  *  1  *>  . .  h 

i  it  V 


where  (3.1)  gives  the  frequency  of  the  number  of  nrimarv  segments  relevant  to 
the  truer"  q..  and  (3.2)  gives  the  frequency  of  the  number  of  units  of  subordinate 


segments  relevant  to  a.. 


There  can  be  links  between  primarv  segments  of  two  records  and  also  links 
between  units  of  subordinate  segments  of  two  records  thus  frequency  functions 
over  joint  events  also  have  to  be  defined.  They  are  as  follows: 


f°(  % . % .  k>  =  <‘"'V 


(  K  ] 


UNIS 


ind 


1 1 (  i . qi, . %, . V  =  fiili,*--i?,(s^)  °-4) 


for  ?’  =  2.  3.  .  .  .  ,  !< . 

t'sing  rhp  sain.?  type  cf  -at- theoretic  calculations  an  in  section  2,  the  formula 
for  frequency  of  segments  pertinent  in  a  bucket  when  the  chaining  technique  is 
used  can  be  derived.  It  is  given  by 


k  k 


k  k  k 


h<!»  -Xv-XX'-v,  XXX  roi.‘-. 


.  .  .  ...*(-:>k+l  f 


0123. .  .k 


i*l  llH2 


W‘3 


k  k 


v  V 


k 


X'»  -xx  vrixx 

1=1  ll*l2  il*t2*il 


. .  ,+(-l)k+1  f 

2i3  1123 ... k 


(3.5) 


If  rfl  denotes  the  length  of  a  primary  segment  and  r^  the  length  of  a  unit  of 
the  subordinate  segment,  then  the  space  occupied  by  the  segments  in  a  chaining 
technique  excluding  the  link  fields  is  given  by 

k  k  k 

S ( f„(h) )  =  r. 


0 


£  fr)i  'SS  'ni.i2  +22£fqiiV7 


i*1  v;2 


ws 


, , k+1 . 
+  (-l )  t 


0123..  .k 

k  k 


i»i  i^i. 


id12^13 


+(-l)k+1f 


1123. ..k 


(3.6) 


As  in  section  2,  if  the  length  of  the  link  fields  are  assumed  to  be  constant 
and  denoted  hv  "  and  the  length  of  the  terminating  field  bv  then  the 
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s torage~space  for  the  chaining  technique  in  hierarchical  files  is  given  by 

l  k 

sh(c>  *  s(fM<B))  n.7) 

1-0  i-1 


4.  RETRIEVAL  TIME 


The  time  needed  to  retrieve  all  the  records  pertinent  to  a  query  when  chaining 
technique  has  been  used  can  be  calculated  if  the  path  of  the  search  procedure 
is  known.  Suppose  the  records  pertinent  to  tne  query  q . ( i=  1 , 2, . . . ,k J  are 
denoted  by  ( j = 1 , 2, . . , , f ^) .  The  search  path  for  q.  starts  with  r  .  }  and 
then  moves  to  and  then  to  r.^  and  so  on.  The  search  terminates  with 

rif.‘  ^or  s * mplicity ,  it  will  oe  assumed  that  the  records  for  segments  of 
records  for  hierarchical  files.)  are  of  fixed  length  and  the  time  needed  to  read 
a  record  and  tne  link  field  associated  with  it  will  be  denoted  by  T } . 


The  access  time  from  the  record  r^  to  rijn>  after  reading  the  link  field 
associated  with  ri_i^-  be  denoted  oy  x(r^j  ,  r^j^).  Assuming  that  the 

link  fields  can  tie  converted  into  access  commands  instantly,  the  time  needed  to 
retrieve  all  records  pertinent  to  the  query  q.  is  given  by 


fi-1 


Uqt)  « 


j  =i 


t(r.  . ,  r.  ,) 

i  j  i  .1  ♦  1 


Vl 


(1.1) 


where  t 

oi 


is  the  time  needed  to  reach  the  first  record  r.  ,  . 

i  1 


The  retrieval  time  for  each  query  can  ne  calculated  from  (4.1)  and  the  total 
or  average  retrieval  time  for  the  set  of  k  queries  can  be  calculated  from 
these  components.  In  many  practical  situations,  all  queries  are  not  used 
equally  frequently,  hence  tne  probability  or  relative  frequency  of  usage  of 
queries  have  to  be  taken  into  consideration  for  calculating  the  average 
retrieval  time  for  a  set  of  queries.  If  tile  probability  or  the  relative 
frequency  of  usage  of  the  queries  are  denoted  by  Pj  ,P ,, .  .  . ,!’  ,  then  the 


iu-ir 


a  vc  rage 


retrieval  time  is  given  by 
k 


i  - 1 


In  the  special  case,  when  all  the  queries  arc  used  equally  frequently,  then 
!'j  =  P.,  =  ...  =  =  1/k  and  hence 

k 

T  “  2T(qi3A- 
i  =  1 


01 

s  on 


In  order  to  calculate  T  from  (4.2)  and  (4.1),  r(r^,  r_+j)'s  and  T 

must  he  known.  In  order  to  calculate  tiiese,  the  positions  of  the  ' 
the  storage  hardware  must  oe  known  or  some  statistical  estimates  have  to  be 
considered,  t 

01 

r 


is  a  special  case  of  t(r^. 
to  consider  calculation  of  t(r^  ,  r^  +  j)  only. 


r. .  ,)  hence  it  is  sufficient 
ij  +  1 


It  will  be  assumed  that  the  records  are  stored  consecutively  cm  the  storage 

device  and  the  chaining  is  in  one  direction,  i.e.,  the  number  of  records 

between  r.  .  and  r.  .  ,  is  greater  than  the  number  of  records  between 

i }  l  .1  *  2  b 

r  and  r  Thus  the  number  of  records  lying  between  r.  and  r  .  , 

1 .!  1 J  + 1  1 J  l )  *  1 


can 


between  0 


f.  . 

i 


It  will  also  be  assumed  that  the 


records  are  stored  on  a  magnetic  disk  storage  with  ij  records  per  track. 


Usually  statistical  estimates  involve  less  unknown  measurements  than 
exact  expressions,  hence  a  statistical  estimate  of  T(r.  ,  rij+i^  is  considered 
first.  it  will  oe  assumed  that  the  probability  of  a  record  being  stored  on  a 
particular  track  is  the  same  for  ail  tracks  and  within  a  track  the  records  are 
uniformly  distributed.  These  assumptions  are  almost  equivalent  to  uniform 
probability  distribution  of  the  records  over  the  storage,  except  for  the  tracks 
at  the  beginning  and  end. 
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The  number  of  tracks  occupied  by  all  the  records  is 


[i>] 

Li  si  J 


where  jxj 


means  the  greatest  integer  contained  in  x.  The  first  track  on  which  r 
can  he  stored  is  [fj-lj/u]  ♦  l11'  track.  Similarly  the  last  track  on  which 


r.  .  ,  can  be  stored  is 

ij  +  1 


_ 1 

1 _ 

L  -  J 

U 

when 


/ k 

r-  K 

Eh- 

Eh- 

u) 

V  i=  1 

L  i  =  l  J 

a 

(4.3) 


(4.4  | 


Mien  the  left  side  of  (4.4)  is  negative,  then  the  last  track  on  which  r  . 

n  +  1 


can  be  stored  is 


♦  1th 


track,  i  the  last  track. 


Let 


Under  the  assumptions  already  stated,  the  probability  that  the  magnetic  head 

of  the  disk  storage  has  to  travel  x  tracks  to  re  ich  r  from  r  j  c 

i  .i  *  1  i  j 


whe  re  x  =  0 ,  1 , 2 ,  . .  .  ,  v  - 1  . 

i 


111-19 


Lot  the  seek  function  of  the  magnetic  disk  storage  be  denoted  by  l^(x)  ''here 
\  is  the  number  of  tracks  to  be  travelled  and  i^(x)  is  the  time  needed 
measured  in  some  suitable  units.  There  is  no  explicit  functional  form  for 
ix),  though  its  value  for  every  point  has  been  determined  by  Monte  Carlo 
methods.  Titus  the  expected  seek  time  for  r  ,  from  r  .  is  given  bv 

i  i  +  1  11' 


(v  .  -  x  1 

_ 


.  ( X » 


x=0 


(4.  ") 


Under  the  assumption  that  the  records  are  uniformly  distributed  on  a  track  and 
that  the  beginning  of  a  search  on  a  track  is  a  point  .•  osen  at  random,  the 
seareit  time  of  a  record  on  a  track  will  be  half  the  rotational  time  of  the  disk. 
Thus  if  the  rotational  time  of  the  disk  is  denoted  by  i  ,  then  the  average 
search  time  tor  a  record  on  a  track  is  given  by 

1  i  .  (4.8) 

t  r 


Combining  (4.7)  and  (4.8),  a  statistical  estimate  of  n  ^  ri  j  ’  rij  +  l^  can  i,e 


obtained,  which  will  be  denoted  by  x(r. 


tj 


r  .  ,)  and  is  given  bv 

ij+1 


(4,9) 


Substituting  (4.9)  in  (4,1)  an  estimate  of  t(q^)  can  be  obtained  and  thus  an 
estimate  of  T  can  be  obtained  from  (4.2). 

It  is  interesting  to  compare  the  retrieval  time  of  an  Inverted  File  Organization 
with  (4,2).  !n  an  Inverted  File  all  the  records  pertinent  to  a  query,  say  q^, 
are  stored  contiguously  on  a  disk  storage.  Thus  the  retrieval  time  for  all 
record*  pertinent  to  ,  under  the  assumptions  already  stated,  is  given  by 
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w 


(4.10) 


If  the  probability  of  usage  of  the  queries  arc  taken  into  consideration  then 
the  average  retrieval  time  for  inverted  tile  is  given  by 

k 

T.  -Y,  h  ’i  "id-  (4' 

i  =  1 


Thus  a  measure  of  the  additional  retrieval  time  that  has  been  paid  as  a  price 
for  saving  of  storage  space  in  chaining  technique  over  Inverted  File  is  given 
by 

k 

T-  Tl  P,  <'«i>  - 

i  =  l 


A  statistical  estimate  of  (4.12)  can  he  obtained  by  replacing 
by  its  estimate  given  in  (4.9). 


i  ( r . 


1 1 


i  i  + 1 


lixact  expression  for  r(r. 


rijr^ 


In  order  to  calculate  an  exact  expression  for  tfr.  r.  .  ,)  the  exact 

i  j  j  j  +  1 

storage  locations  of  r. .  and  r. .  nave  to  be  known.  In  a  disk  storage 

1  J  1  J  +  i  C' ' 

device,  which  is  a  two  dimensional  storage  (as  tne  magnetic  nead  can  be 
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switched  instantly  on  any  track  on  the  same  cylinder,  the  different  tracks  on 
the  same  cylinder  are  not  considered  as  an  additional  dimension!  the  storage 
location  of  a  record  ,  say  S(r^),  is  determined  by  two  parameters,  namely, 

tiie  track  number  ( x— )  and  the  angular  position  (w.^)  measured  in  radiants 
from  a  fixed  point  on  the  track.  Thus 


S(r.  .)  «  C X •  ■ «  w.  •)  • 
ij  ij' 


(4.13) 


Thus  the  storage  distance  from  the  record  r.  •  to  the  record  r.  . 

ii  i  j  < 

given  by 


S(r  . ,  r.  .  .)  =  (  x  ■  ■  ,  *  x  •  •  t  w .  .  ,  ■  u> .  . ) 

v  ij  ij  +  1  ij  +  1  ij  ij  +  1  ij 


(4.14) 


r(r^j,  r.^+j)  contains  two  components,  the  access  (seek)  time  and  the  searcli 
time  on  tiie  track.  The  access  time  is  given  by 


V*ij*i  •  xijJ' 


(4.15) 


For  calculating  the  search  time,  two  functions  have  to  be  introduced.  Let 
F  ix)  =  the  fractional  part  of  x 

t  (iiij,  a,)  =  the  time  needed  for  the  disk  to  rotate  from  the 
angular  position  to 


As  the  disk  rotates  2ir  radians  in  time  t  ,  hence  the  search  time  is 

r 


—  f. 


Th(--iurxii) 


+  cu.  . ,  to .  .  .  *  . 

ij  tj+i 


(4.16) 
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Hence , 


2ir 


(rij*  =  V^i  '  hf  +  \[t;  f 


Th  (xij*rxij; 


mi— ii-  *  . 

T  ill 


ij*l  ‘ 


(4.17; 


SUMMARY 


It  has  ueen  shown  tiiat  the  problem  of  file  organization  can  be  looked  at  as 
a  problem  in  time  and  space.  Retrieval  time  and  storage  space  are  tiie  two 
important  factors.  Both  of  these  factors  cannot  be  reduced  simultaneously. 
They  move  in  opposite  directions.  A  file  organization  technique  attempts  to 
balance  these  two  factors.  In  this  paper,  these  concepts  have  been  brought  to 
light  using  "chaining  technique"  of  file  organization.  Mathematical  formulae 
for  retrieval  time  and  storage  space  need  nave  been  calculated.  It  has  oeen 
shown  that  some  of  the  formulas  become  simple  when  statistical  assumptions 
are  made.  Retrieval  time  and  storage  space  required  in  chaining  technique 
have  been  compared  with  that  of  Inverted  File  organization.  One  important 
aspect  which  the  author  would  lika  to  emphasize  is  that  it  is  difficult  to 
combine  retrieval  time  and  storage  space  requirement  into  one  measure  of 
goodness,  and  this  should  not  be  attempted  when  comparing  different  file 
organization  techniques. 
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ABSTRACT:  This  paper  develops  theories  for  constructing  filing 
schemes  for  formatted  files  with  unequal -valued  attributes  when 
the  query  set  contains  all  queries  which  specify  two  values  from 
two  attributes.  These  filing  schemes  provide  a  set  of  buckets 
for  storing  accession  numbers  of  records.  The  retrieval  rule  is 
based  on  identifying  a  bucket  from  a  query  by  solving  algebraic 
equations  over  finite  fields,  The  theories  underlying  these 
filing  schemes  are  based  on  properties  of  deleted  finite  geometries. 
Expressions  for  retrieval  time  and  storage  redundancy  are  also 
given . 
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I.  INTRODUCTION 

A  large  volume  of  data  may  be  stored  in  many  ways  in  the 
storage  area  of  a  computer.  Usually,  these  arc  stored  in  small 
blocks  called  records.  A  record  is  an  information  block  and 
can  have  any  structure,  but  we  will  discuss  the  case  where  this 
block  of  information  is  represented  by  an  n-vector,  each  com¬ 
ponent  being  a  number  providing  information  regarding  one 
of  a  set  of  *  attributes  Aj,  A,  ...,  A  .  These  numbers  arc 
called  values  of  the  attributes.  Each  record  has  a  record- 
identification  and  is  dene  ed  bv  i.  Thus,  if  v..  denotes  the 
value  of  the  jth  attribute  of  the  ith  record,  then  the  structure 
of  the  record  is 

f(i)  *  (il  v.j,  v.,,  -  vj(  ) 

The  collection  of  records  is  called  a  file.  One  of  the 
main  purposes  of  storing  records  in  a  computer  is  the  ability 
to  recall  any  subset  of  the  file  which  meets  certain  criteria. 
This  criteria  is  also  called  a  "query."  A  query  may  specify  a 
key,  or  a  collection  of  keys,  or  a  particular  value  of  an 
attribute  or  a  collection  of  values  of  different  attributes  or 
a  combination  of  keys  and  values  of  attributes.  In  general, 
a  query  may  be  stated  by  the  user  in  many  different  forms  but 
at  the  particular  stage  of  retrieval  process,  when  the  query  has 
to  interact  with  the  file,  it  has  similar  structures  to  the  ones 


IV -3 


described  above.  In  this  paper,  we  confine  ourselves  to 
structures  of  queries  which  specify  a  collection  of  values  of 
attributes.  Thus,  we  may  want  to  retrieve  all  records  for 
which  the  attributes  A.  ,  A  ,  A.  have  the  values 

■1  3  2  Jg 

v .  ,  v  .....  v.  (g  e  , whatever  be  the  values  of  the  other 
1,1,  i 

•  1  '  -1  g 

-  g  attributes,  he  shall  denote  this  query  by 


All  records  for  wrhich  the  attributes  A.  have  the  value 

-1  i 

v.  for  i  =  1,  2,  ....  g  are  said  to  satisfy  the  query  or 
•  i 

pertain  to  the  query. 

Records  within  a  file  are  organized  according  to  a  scheme 
to  reduce  the  time  needed  to  retrieve  pertinent  records  for  a 
given  class  of  queries.  Ihe  problem- of  file  organization  is 
fairly  simple  when  queries  relate  to  only  one  attribute.  A 
summary  of  this  type  of  worh  has  been  given  by  Buchholz  fl963). 
Prywes  ot  al.  (1961)  attempted  to  resolve  the  problem  of 
minimizing  search  time  for  multiple  attribute  queries  by  group¬ 
ing  attributes  into  composite  attributes  and  forming  a  tree 
structure,  and  Davis  and  bin  suggested  the  formation  of 

partition  classes  by  considering  possible  values  of  logical 
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fields.  Abraham,  Ghosh  and  Ray-Chaudhuri  (1968)  used  finite 
geometry  to  construct  combinatorial  filing  schemes  for  binary 
attributes.  Their  method  consisted  of  forming  groups  of  records 
in  such  a  manner  that  the  group  containing  records  pertaining 
to  a  given  query  could  be  determined  algebraically,  thus, 
expediting  the  search.  Ray-Chaudhuri  11968)  discussed  some 
further  combinatorial  properties  of  file  organization  schemes 
for  hinary  valued  attributes.  Ghosh  and  Abraham  (1968i 
developed  the  theory  for  file  organization  schemes  for  multiple 
valued  attributes  where  attributes  have  an  equal  number  of 
possible  values  and  the  queries  specify  two  values  jf  two 
attributes.  This  result  was  generalized  by  Abraham,  Bose  and 
Ghosh  (1967)  to  the  case  where  the  queries  specif)-  t  (■  .) 
values  of  t  attributes.  In  this  paper,  properties  of  t- inde¬ 
pendent  linear  forms  over  a  Galois  field  G('(s)  were  used  for 
constructing  file  organization  schemes.  In  a  subsequent 
preliminary  report,  Abraham,  Bose  and  Ghosh  (19(0  showed  that 
by  using  some  dependent  linear  forms  along  with  independent 
linear  forms,  filing  schemes  can  be  constructed  when  attributes 
have  unequal  values  which  are  some  multiples  of  s,  where  s  is  a 
power  of  a  prime  number.  In  this  paper  it  will  he  shown  that 
when  the  query  set  consists  of  all  queries  which  specify  two 
values  from  two  attributes,  then  it  is  possible  to  develop 
combinatorial  filing  schemes  if  the  attributes  have  unequal 


IV- 5 


values.  These  results  will  be  developed  in  two  parts.  In  the 
first  part,  deleted  (or  full)  finite  geometries  will  be  used  to 
develop  filing  schemes  when  the  number  of  values  the  attribute.-; 
can  take  are  any  multiples  of  s.  In  the  second  part,  partially 
deleted  geometries  will  be  used  to  construct  filing  schemes 
when  the  attributes  can  take  any  number  of  values. 

2.  UNEQUAL  VALUED  FILING  SCHEMES 

In  most  computerized  filing  systems,  the  records  are  stored 
in  some  relatively  slow  storage  area.  The  starting  address  of 
a  segment  of  the  storage  device,  where  the  record  is  stored  in 
its  entirety,  is  called  the  accession  number  of  the  record.  A 
set  of  comparatively  faster  memory  or  storage  device  (like  the 
cylinder  of  a  random  access  disc  package)  is  reserved  for 
storing  the  accession  numbers.  Let  this  set  be  denoted  by  S. 

In  this  paper,  file  organization  schemes  are  rules  for  storing 
accession  numbers  of  records  in  S,  and  retrieving  the  pertinent 
accession  numbers  when  a  query  is  asked.  These  rules  are 
referred  to  as  storage  rule  and  retrieva  1  rule .  In  all  filing 
schemes  which  are  in  practice  today,  the  retrieval  rule  consists 
of  matching  operation,  whereas  in  the  filing  schemes  discussed 
here,  the  retrieval  rule  can  be  algebraic  computation  only, 
by  proper  choice  of  hardware.  In  order  to  achieve  this  property, 
the  storage  rule  is  developed  from  structures  of  some  finite 
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geometries.  The  query  set  consists  of  queries  which  specify 
two  values  of  two  attributes,  thus,  when  ■  2,  an  accession 
number  of  a  record  is  stored  in  more  than  one  location  in  S . 

The  redundant  storage  of  accession  numbers  can  be  avoided  by 
using  complex  chaining  techniques  and,  thus,  increasing  search 
time.  In  the  storage  rule  which  is  discussed  in  this  paper,  a 
limited  amount  of  chaining  is  used  to  reduce  the  redundant 
storage  of  accession  numbers  but  not  increase  the  search  time  too 
much.  Additional  storage  requirement  and  search  time  formulae  for 
filing  schemes  are  discussed  in  detail  in  latter  sections. 

A  collection  of  all  queries  which  specify  t  values  of 
t  different  attributes  are  defined  to  be  combinatorial  query 
sets  of  order  t,  and  will  be  denoted  by  Q 

A  Multiple-valued  filing  Scheme  with  parameters 
(t,  n^,  n^,  n. ,  b)  is  defined  to  be  an  arrangement  of  the 

accession  numbers  of  records  with  >  attributes,  where  the 
vector  of  the  number  of  values  these  attributes  can  take  is 
given  by  (n^,  n^,  ...,  n„), in  b  groups  (buckets  I,  which  are 
not  necessarily  mutually  exclusive  and  which  satisfy  the 
following  properties: 

(1)  The  accession  numbers  in  a  bucket  is  a  subset  of  all 
accession  numbers  (property  of  redundancy) . 

(2)  Associated  with  each  bucket  is  an  algebraic  identifier. 

There  is  a  correspondence  between  the  algebraic  identifier 


ot'  a  bucket  and  the  accession  numbers  in  the  bucket 
(property  of  i  dent i f i ab i I i ty ) . 

(3)  Corresponding  to  any  query  which  specifies  t  (t  ’  _’]  values 
of  t  different  attributes  there  exists  a  unique  bucket 
(property  of  uniqueness). 

A  multiple  valued  filing  scheme  corresponding  to  a 
combinatorial  query  set  of  order  t  will  be  denoted  by  MY|-'S  . 

Ghosh  and  Abraham  (19bS)  have  constructed  Balanced 
Mult iple- valued  Filing  Schemes  of  order  d .  In  those  filing 
schemes  =  n,  =  ...,=  n  =  s  and,  thus,  it  was  possible  to 
have  each  bucket  containing  the  accession  numhers  of  an  equal 
number  of  queries.  In  MVFS  when  the  n.’s  are  not  equal,  in 
general,  the  buckets  may  not  contain  the  accession  numbers  of 
an  equal  number  of  queries.  In  some  of  the  cases  discussed  in 
this  paper,  M\T’S  will  be  balanced  with  respect  to  queries. 

The  problem  of  construction  of  MVPS^  can  be  considered  as 
a  problem  in  combinatorial  algebra  which  may  be  stated  as 
follows:  given  ,  sets  of  sizes  n^,  n^,  ...,  n,  how  can  b  groups 
be  formed  such  that  any  subset  of  t  elements  from  t  (of  the  K) 
sets  will  be  contained  in  one  and  only  one  group,  and  it  should 
be  possible  to  determine  that  group  algebraically  from  the 
subsets.  No  satisfactory  mathematical  theories  are  known  which 
will  provide  a  direct  answer  to  this  problem  so  an  attempt  has 
been  made  to  take  finite  geometries,  which  have  symmetric  structures, 
and  delete  some  portions  to  provide  an  answer  to  this  problem. 
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3.  DELHI  ED  IINITH  CEOMETRI  ES 

An  N  dimensional  finite  Euclidean  geometry  defined  over 
a  Galois  field  Gr(sj  where  s  =  pn  and  p  is  a  prime  integer 
is  denoted  by  LG(X,  s;.  Points  in  this  geometry  are  denoted  In 
n-tuples  of  the  form  x  =  ( x  ^ ,  x^,  x^.)  where  x.  t  GH(s).  A 

t-dimensional  flat  in  this  geometry  is  defined  by  a  set  of 
points  which  satisfy  N-t  independent  linear  equations  with 
coefficients  in  GF ( s ) .  There  are  s^  points  in  this  geometry, 
and  any  t-dimensional  flat  contains  s*  points.  The  number  of 
t-flats  in  EG ( N ,  s)  is  equal  to  p(.\'-l,  t-1,  s')  where 

,  N+l  N  .  N-t+1  ,, 

i  (N  ,  t,  s)  *  — - . b  - —) - 

t  + 1  t 

(s  -l)Cs  -1)  ...  (s-1) 

When  some  structures  from  a  finite  geometry  are  deleted, 
then  the  resulting  geometry  is  called  a  deleted  geometry-, 
e.g.  some  lines  of  a  EG(N,  s)  may  be  deleted.  The  remaining 
lines  and  all  the  points  of  the  geometry  may  be  used  to  construct 
a  filing  scheme.  This  technique  was  used  by  Ghosh  and  Abraham 
(1968)  to  construct  balanced  multiple-valued  filing  schemes  of 
order  2.  When  some  points  of  the  geometry  are  deleted,  then 
irregular  structures  are  obtained,  i.e.,  all  the  t-flats  of  the 
geometry  do  not  have  the  same  number  of  points.  These  types 
of  deleted  irregular  geometries  will  be  called  part i al ly  deleted 


geometries . 


I.xanr. le  3.1 

Consider  a  t:C(2,  j).  There  are  l>  points  in  this  geometry 
which  may  be  represented  (without  a  comma  separation  between 
the  coordinates)  as  00,  01,  02,  10,  11,  12,  20,  21,  22.  The 
12  lines  of  the  geometry  with  their  algebraic  equations  are: 


hquation 


Points 


X1 

= 

0 

00, 

01  , 

02 

X1 

= 

1 

10, 

11, 

12 

X1 

= 

t 

20, 

21 , 

■>  ' 

x2 

= 

0 

00, 

10, 

20 

x2 

= 

1 

01, 

1 1 , 

21 

JW 

2 

02, 

12, 

-»  •* 

VX2 

= 

0 

00, 

12, 

21 

V*2 

= 

1 

01, 

10, 

->  i 

VX2 

= 

T 

02, 

20, 

1 1 

x]+2x2 

= 

0 

00, 

11, 

x  +  2x 

1  2 

= 

1 

10, 

21, 

02 

x  +  2x 

1  2 

") 

20, 

12, 

01 

If  the  points  00,  12,  and  21  are  deleted  from  this 
geometry,  then  we  get  a  partially  deleted  geometry  whose 
lines  are: 
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0 


x  *  1 
1  1 


VX2  -  1 


VX2  =  2 


01, 

02 

x2 

=  0 

10, 

20 

ir. 

1 1 

X2 

=  1 

:  01, 

11 

20, 

22 

X2 

=  2 

;  20, 

“>  -I 

01, 

10, 

22 

xr2x2 

=  0 

11, 

22 

02, 

20, 

11 

xr2x2 

«  1 

10, 

02 

X1+2X2 

=  2 

:  20, 

01 

This  partially  deleted  geometry  contains  9  points  and  11  lines. 

9  of  the  lines  contain  2  points  each,  and  2  lines  contain  3 
points  each. 

If  only  the  3  lines:  *  0,  Xj  =  1  and  x}  =  2  are  deleted 

from  LG(2,  3),  then  the  resulting  geometry  is  a  deleted  ge  .lie t ry 
with  9  lines  and  each  of  the  lines  contains  3  points. 

A  spread  in  a  finite  geometry  is  a  collection  of  disjoint 
flats  whose  union  covers  the  geometry.  In  the  example  3.J,  the  3 

lines  x .♦*,  =  0,  x  -  1>  and  x.-<-x_  =  2  form  a  spread,  fiach 

of  these  three  lines  are  called  an  element  of  the  spread . 

4.  MULTIPLE  VALUED  FILING  SCHEMES  AND  DELETED  GEOMETRIES 

Consider  an  EG(N,  s)  and  a  point  in  this  geometry  denoted 
by  (x  ,  x,,  x  )  where  x.  e  GE(s).  An  N-l  dimensional  flat 

X  Z  1 

in  this  geometry  is  defined  by  the  set  of  s^’*  points  which 

satisfy  the  following  equation  Xj  +  x,  ♦  ...  +  a^  x^  =  Cj 

where  the  s.'s  and  r  are  elements  of  GF(s).  If  a  . 's  are 

1  j  1  1 .1 
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kept  fixed  and  Cj  is  given  the  s  different  values  of  e!(s), 

then  we  get  s  (N- 1 J -flats  which  form  a  spread.  Suppose  these 

s  1  i  -  flats  are  identified  with  s  attributes  and  the  s'"'* 

points  on  each  of  the  elements  of  the  spread  are  identified 

with  sN  1  values  which  the  attributes  can  take.  Thus,  a  1-1 

correspondence  between  a  formatted  file  with  s  attributes  where 

each  attribute  can  take  *  values  is  established.  The  file 

can  have  s"  *■  **  distinct  records  and  each  record  is  identified 

with  an  s-tuple  of  points  in  the  geometry.  There  are 

s'  :  IN-1  ,  0,  s]  =  *  I  s  - 1  )  /  ( s- 1  ]  lines  in  the  geometry  and 

each  element  of  the  spread  a^x^  -  *  ...  *  u^x^  =  c 

(Cj  ■  (ills)!  contains  sN~"  ;(N-2,  0,  s)  =  sN~"  (s^’ 1  -  lj/(s- 1  ) 

lines.  If  all  the  lines  which  lie  completely  on  the  elements 

of  the  spread  are  deleted  then  we  will  ?et  a  deleted  geometry 

with  s'  J(s'  -l,}/(s-n  -  s'  (s  -1)/  fs- 1 J  -  s'  11  lines. 

The  buckets  of  the  filing  scheme  will  be  identified  with  the 
2  f\  - 1 1 

s  lines  of  the  deleted  geometry.  Lach  line  in  HC.  ( N ,  s) 

is  represented  by  a  system  of  N-l  linearly  independent  equations. 
Thus,  the  matrix  of  the  coefficient  of  the  equations,  which  have 
order  (N-l)  x  (N+l)  may  be  used  as  a  bucket  identification.  The 

storage  rule  for  the  record  f(i)  =  (i;  v.j,  v._, . v  )  is 

defined  as  follows:  the  accession  number  of  the  record  f(ij  is 
stored  in  all  buckets  which  have  at  least  two  points  which  are 
common  with  the  s-tuple  point-representat ion  of  ffi).  Inside 
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the  bucket  the  accession  numbers  are  subdivided  into  s(s-l.)/2 
sub-buckets  corresponding  to  the  s(s-])/2  pairs  of  points  of 
the  bucket.  An  accession  number  of  a  record  may  be  entitled 
to  be  stored  in  more  than  one  sub-bucket  within  a  bucket  but  in 
order  to  reduce  redundant  storage,  an  accession  number  will  be 
stored  only  in  one  sub-bucket  within  a  bucket  and  properly  chained 
(chaining  technique!  with  the  other  relevant  sub-buckets  within 
the  bucket.  Chaining  between  buckets  is  avoided  to  reduce  compli¬ 
cations.  The  same  rule  is  applied  to  store  the  accession  numbers 
» 

of  all  the  records  in  the  file. 

Example  4. 1 

Consider  a  EG(3,  3)  and  a  spread  of  planes  in  this 
geometry.  Suppose  the  spread  is  represented  by  the  equations 
Xj  =  0,  Xj  =  1,  and  x^  '=  2.  Ke  use  this  geometry  and  this 
spread  to  construct  a  filing  scheme  for  a  file  with  3 
attributes,  where  each  attribute  can  take  9  vaiues.  The 
3  attributes  A^,  Aj,  and  A£  will  correspond  to  the  3  elements 
of  the  spread  and  the  values  that  these  attributes  can  take 
will  correspond  to  the  points  on  the  spread.  Let  v! ^  be  the 
j1*1  value  of  the  attribute  (i  =  0,  1  .  . . ,  8,  i  *  0,  1 ,  2}  , 
and  the  correspondence  between  the  values  and  the  points  are 
as  follows:  the  point  (ijkl  will  correspond  to  the  value 
v!  where  m  =  3j+k.  The  buckets  will  correspond  to  the  SI 
lines  of  the  geometry  which  do  not  lie  completely  in  any  one 
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of  the  three  pianes  Xj  =  0  or  x^  =  1  or  X|  =  2.  Suppose  a 
record  is  composed  of  the  values  v'  and  v'  ,,  i.e., 

ffi]  =  (i;  v'qj.  v'|q.  v'sj)-  Then  the  po i nt - represent  at l on 
of  th.is  record  is  ffi)  -  (i;  010,  100,  211).  According  tu 
the  storing  rules,  the  accession  number  of  this  record  is  stored 
in  three  buckets  corresponding  to  the  three  lines:  x._  -  0, 
x ,*x,  -  1;  x,  -  1,  x  ,+x,  -  1  and  x  ,+2x.,  -  1,  x  ♦x,  +  x,  ~  1. 

id  2  1  j  1  i.  120 

These  three  buckets  mav  be  identified  by  their  matrix  of 

coefficients  as 


Within  the  bucket  with  label 


(noi)  r*’e  access^on  number  of  the 
record  ffi)  will  be  stored  in  the  sub-bucket  corresponding  to 


the  pair  010,  100.  This  sub-bucket  may  be  given  a  sub-bucket 
label  010100  (obtained  by  concatenating  the  ordered  pair). 

Ihe  storage  rule  defined  above,  will  provide  a  filing 
scheme  which  answers  all  queries  of  a  combinatorial  query  set 

of  order  2,  i.e.,  Q,  or 
1  A  . 


J  ! 


J  1 


lV  '  .  V '  , 

),) 


The  retrieval  rule  for  any  query  belonging  to  ,  will  con¬ 
sist  of  five  steps: 
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(  i )  The  specified  values  of  the  attributes  will  be  converted 
into  their  po i nt- represent  at i on . 

;iij  The  e-quatiuri  of  the  line  containing  the  particular  pair 

of  points  is  determined  by  solving  N-dimensional  algebraic 
equations  over  (II  ( s J  . 

(iiij  The  bucket  corresponding  to  the  line  is  reached  by 
appropriate  command  to  the  storage  unit. 

Civ)  The  appropriate  sub-bucket  is  reached  by  matching  the 
sub-bucket  labels  with  the  query  within  the  bucket. 

(vi  The  accession  numbers  are  retrieved  from  the  sub-bucket 
and  the  corresponding  records  are  retrieved  from  the 
storage . 

In  a  liflf  N ,  s)  through  any  two  points  there  passes  only  one 
line,  thus,  step  (ii)  of  the  retrieval  rule  will  always  provide 
a  unique  bucket,  and  hence,  the  property  of  uniqueness  of  MV1-S  , 
will  he  satisfied.  In  the  preceding  discussions,  it  ha?  also 
been  established  that  the  above  filing  scheme  satisfies  the 
property  of  redundancy  and  ident ifiabi 1 i tv .  Thus,  we  have  the 
following  theorem: 

Theorem  j . 1 

There  exists  a  MV1-S,  with  parameters  t  =  2,  >  =  s,  =  n, 
N-l  ,  ,  2(X- 1)  . 

=...,=  n  =  s  and  h  =  s  ,  where  s  is  the  power  ot  a 

prime  number. 

Let  the  elements  of  CF(s)  be  denoted  by  ^ .  . 

Consider  a  LOIN,  s  )  and  choose  an  (\-li  dimensional  spread  m 
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it.  let  it  be  denoted  bv  a  «  *n  x  *  ...  ■*a,.x..  -  c.  where  the 

’  1  i  i  \  •-  u  I  >  IN  | 

a .  's  arc  Fixed  and  c,  varies  over  all  the  s  elements  of 
)'  1 

fir t  s  ;  . 

Consider  -  s  elements  of  this  spread  which  are  repre¬ 
sented  bv 


allxra12x2*  '  ■  •  +aISXN  =  a0 


all  Xl  +  ai2’V  •  '  ■  *alNXN  =  J1 


(4 .  1  j 


anxrai2x2' 


^  a  x 
IN  S 


and  associate  with  them  attributes  with  s*  *  values  each. 

The  sV‘  points  on  each  of  these  ( N - 1 )  dimensional  flats 
represents  the  values  of  the  attributes. 

Consider  another  s^,  where  +  *  s, elements  of  the  (N-l) 

dimensional  spread  and  partition  each  (N-l) -flat  into  (N-2) 
dimensional  spreads.  These  may  be  represented  as  follows: 


allVa12x2‘ 


♦  an,x4.  “  "j. 
IN  N 


(4,2) 


a21Xra22V 


+a2NxN  =  C2 


where  the  a.  ’s  are  fixed  and  varies  over  all  the  s  elements 
of  GF(s)  and  j  varies  over  1  to  s^. 
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In  (4,2)  the  collection  of  s  (N-2) -flats  for  a  fixed  value 

of  j  is  a  (N-2)  dimensional  spread  of  a  (N- 1 ) -dimens ional  flat. 

Associate  with  these  s,s  (N-2)  flats,  s2s  attributes  with 
N  “  2 

s  vtiues  each.  The  values  of  the  attributes  are  associated 
w'Lh  the  points  of  the  (N-2) -flats.  This  technique  can  be 
repeated  (N-l)  times  and,  thus,  establishing  an  association 
between  a  EG(N,  s)  and  a  file  with  the  following  type  of 
structure : 


sl. 

attributes 

have 

SN-1 

values 

each 

S2S 

attributes 

have 

sN-2 

values 

each 

2 

S3S 

attributes 

have 

N-3 

s 

values 

each 

(4.3) 

Sjj  jSN  ^  attributes  have  s  values  each 
where  s  +S-  +  ...  +s.,  ,  =  s 

i  C  l\  -  1 

In  this  association,  the  s.^s1  attributes  with  values 

each  will  be  associated  with  the  following  (N-i-l)-flats 


allVa12X2+  *  alN*N  "  V 


a2lVa22*2+  *a?vxv  =  c 


2N  N 


(4.4) 


VllWliV  •••  *“i*lXXN  =  Ci*l 


where  a,  's  are  fixed,  i,  varies  between  s  *  ...  *s  - 
*•  '  l  )  i 
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1 


\ 

i 

4 

1 

ant’  s ,  +  ...  +  s.  ,,  and  c,,  e,,...,  c.  .  vary  over  all  element?  i 

1  i+12  3  i+l  i 

of  (IF  ( s  1  . 

A  deleted  geometry  will  be  constructed  from  EC(\,  s)  by 

deleting  all  lines  which  lie  completely  on  every  flat  which  is 

N-2 

associated  with  an  attribute,  i.e.,  all  the  SjS  ( N  -  2 ,  0,  s) 

lines  which  lie  completely  on  the  (N-l)-flats  given  in  (4.1) 

N-2 

will  be  deleted;  the  s^s  *(N-3,  0,  s)  lines  which  lie 
completely  on  the  s^s  (N-2)-flats  defined  in  (4.2)  will  be 
deleted  and  so  on.  Thus, 

s.sN'2  ;  (N-2,0  ,s )  +  s2sN'2  4(N-3 ,0,  s)  +  s3sN‘2  ;(N-4,0,s)  +  ... 

N-2 

...  v  sN_2s  *0,0, s)  +  sN  x 

lines  will  be  deleted  from  EG(N,  s).  This  deleted  geometry  will 
have 

sN  “  (s  i(N-l,0,s)  -  Sj  4 (N-2 ,0 , s)  -  ...  -  sN_2  4(1, 0,s)  -  S^)  (4.5) 

lines.  Each  of  these  lines  will  correspond  to  a  bucket  in  the 

filing  scheme.  The  storage  rule  and  the  retrieval  rule  for  the 

accession  numbers  of  the  records  will  be  the  same  as  for  the 

N-  1 

MVFS^  with  parameters  t  =  2 ,  i  -  s ,  =  n^  =  • • •  =  ng  =  s  , 

b  =  s2(.\'-l)  a  query  specifies  two  values  of  two  attributes, 

then  the  line  passing  through  the  point-representation  of  the 
query  is  contained  in  the  deleted  geometry  described  above.  If 
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the  query  represents  two  values  of  the  sane  attribute,  then  the 
line  passing  through  the  point- representat i on  of  query  is  not 
contained  in  the  deleted  geometry  and,  hence,  there  will  be  no 
bucket  corresponding  to  that  query.  Thus,  we  have  the  following 
theorem : 


Theorem  4.2 

There  exists  a  MVFS,  with  parameters  t  =  2,  <■  *  s  *srs*s„s‘ 


where  s  0  for  all  i  and  s,*s," 
i  1  2 


*Vi  =  s- 


s  being  the  power  of  a  prime  number, 


nl  =  n2  = 


=  n  =  s 
si 


n  ,  =  n  _=...=  n  =  s 

-v1  sr2  w 


n  N-2 

k*Vis  +1 


and  b  =  s™’2  is  i(N-l,0,s)  -  Sj  .f(N-2,0,s)  -  ...  -  s^,_2  f(l,0,sj 


5  1  . 
bN-l  ' 


Example  4.2 


Consider  a  file  which  has  two  attributes  with  9  values  each 


and  three  attributes  with  3  values  each.  The  attributes  are 
denoted  by  AQ,  Aj,  A.,,  A„  and  A^,  The  jt'1  values  of  the  itil 
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attribute  is  represented  by  vj  i_i  -  u,  1 . 8  tor  i  -  0,1, 

and  t  =  0,1,2  for  l  =  2.5.4J.  lor  constructing;  u  MV1  for  this 
file,  consider  a  £j(J(3,3i.  I  he  three  planes  *  0,  x  ^  1, 
and  x,  =  2  form  a  spread  in  l:.i>(3,5i.  Associate  the  y  points 


of  Xj  =  l)  with  the  9  values  of  A  and  the  y  points  of  -  1 

uith  the  y  values  of  A, ,  i.e.,  v '  i  (Uikj,  v!  ;  !  1 i k  >  where 

1  um  '  1  n 

m  =  3j*k;  i,k  =  0,1,2.  The  three  lines  x  =  2,  x,  =  0;  x.  -  2, 
x,  =  1;  and  Xj  =  2,  x,  =  2  form  a  spread  on  the  plane  Xj  =  2. 
Associate  these  three  lines  with  the  three  attributes  A^,  A,  and 
.\j  and  the  points  on  the  lines  with  the  values  of  the  attributes 
in  the  following  manner: 


v’  =  f 20 j  ( ,  v’  *  (21.il,  v*  *  (22jj,  j  =  0,1,2. 
J  J.i 


fcG I  3 , 3 1  contains  117  lines  from  which  the  following  27  lines 
are  deleted: 


Xj  =  0.  x2  =  0;  Xj  =  0,  x2  =  1; 

Xj  =  O',  x„  =  0;  Xj  =  0,  x3  1; 

Xj  -  0,  x,«-x.  =0;  Xj  =  0,  x2+x3  =  1; 

Xj  =  0,  x2+^x3  =0;  Xj  =  0,  X2+2x3  =  *’ 


Xj  =  0,  x2  =  2 

x,  =  0,  x.  =  2 
Xj  =  0,  x2+x.  =  2 

X1  =  °*  X2+2X3  =  2 


X,  =  1,  x2  =  0; 

Xj  =  1,  x3  =  0; 

X1  =  X2+X3  =  0; 

Xj  =  1,  x2+2x3  =  0; 


X1  =  *  ’  X2  =  * ’ 

X1  =  x3  = 

Xj  =  1,  x2+x3  =  1; 

Xj  =  1,  x2+2x3  =  1; 


x  [  1 >  x2  =  2 

xj  =  1,  x3  =  2 

X1  =  1,  x2+x3  = 

Xj  -  1,  x2+2x,  = 
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X1  x:  0:  xj  =  2,  -  i;  Xj  =  2,  x,  =  2. 

Thus,  a  deleted  1(1(3, 3)  with  90  lines  is  obtained.  The  buckets 
will  correspond  to  these  90  lines.  If  the  accession  numbers  of 
the  records  arc  stored  in  these  90  buckets  following  the  same 
storage  rule  as  in  example  4.1,  then  we  will  get  a  MVI-S.,  with 
parameters  t  =  2,  .  =  5 ,  n  =  n,  =  9 ,  n,  =  n^  =  =  3  and 

b  =  90. 

5.  kl.  Li  UN  DANCY  Of  MUl.IlPLi:  VALUED  f  I  LINO  SCIIUMfS 

In  the  filing  schemes  discussed  in  the  previous  section,  an 
accession  number  of  a  record  is  stored  in  more  than  one  bucket, 
and  this  will  give  rise  to  redundant  storage.  The  average  number 
of  times  the  accession  numbers  arc  stored  in  the  filing  scheme  is 
called  the  redundancy  of  the  filing  scheme.  The  redundancy  of 
a  filing  scheme  will  depend  on  the  following  properties: 

(i)  The  frequency  distribution  of  the  different  types  of 
records  in  the  file; 

(ii)  The  patterns  of  values  of  the  attributes  in  the  different 
buckets; 

(iii)  The  number  of  different  values  the  different  attributes 
can  take; 


(iv)  The  types  of  queries  in  the  query  set. 


In  deriving  the  algebraic  expressions  for  the  redundancy, 


it  is  assumed  that  the  query  set  is  of  the  type  Q.,  and  the 

frequency  distribution  of  the  different  types  of  records  in  the 

file  is  uniform.  Thus,  the  number  of  records  in  the  file  is 
i 

some  multiple  of  H  n..  As  we  are  calculating  the  redundancy, 
i  =  1  1 

it  is  sufficient  to  assume  that  the  total  number  of  records  is 

2  n.  .  Suppose  bucket  in  the  filing  scheme  has  the  follow* 
i  =  1  1 

ing  type  of  structure: 


A .  =  v .  ,  A .  =  v . 

J1  Vl  -'2  -'2*2  J*  3 


,  A,  =  v  t 

k  .  3  k  .  k  . 

J  J 


Let  us  denote  by  S.  the  set  of  attributes  which  are  repre- 
sented  in  the  .1  bucket,  i.e.. 


S. 

J 


(5.1) 


S.-A.  will  denote  the  set  of  attributes  of  S.  from  which  A.  has 
,ii  ii 

been  deleted.  Then  under  the  assumption  of  uniform  distribution 
of  records,  it  is  easy  to  see  that  the  number  of  accession 
numbers  which  will  not  be  stored  in  the  j ^  bucket  is  given  by: 


. ;  (n  -1)  il_  n.  ♦  n  •  ^  [[  (n  -1) 

i-S.  3  i  icS.  ir.S.  icS.  i'eS.-A.  Ji' 

J  J  J  J  1 


where  S.  is  the  complement  of  the  set  S. 
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n;  In  -1)  4  •  . 

i  .  l  Sj  1  ,cij  Ji’ 

,'L  nj  I!  fn  -l)  l  +  1  / 

llSj  i;-si  Jj  its.  '  (nj  -1)  / 

J  J  1  i 


Thus,  the  total  number  of  accession  numbers  in  the  jth 


bucket  is 


.u  ni  '  1 L  nj  h  (n .  -1) ,  1+  '  y 

1  =  1  i>sj  i^S  Ji  ieS.AV1)  /' 

J  J  J  A  : 


n.  1  . 


n.  -i 
Ji 


lV’  ’  iUj  \\  1  [’  ‘  lis, 

The  total  number  of  records  in  the  filing  scheme  is: 


*  b 


j ni  a  i  -  .  i  i  +  •  v  ;  \ 

1=1  i£Sj\  "^1  ;j  isSj  : 


Thus,  the  redundancy  of  the  filing  scheme 


is  given  by: 


r  -  y  .  1  - 

j-1  '  ieS. 


/  n .  - 1 


;  icy  \  h  i  |  hs1 
^  • 

Consider  the  MVFS  with  parameters 


lf  ,  \ 

'  tni  A1)  >  j 
■  i  i 


\  -  1  ’  I  V  -  1 ' 

n  -  n,  =  ...  *  n  -  s  p  !>  =  s"'  .  it  a  tile  has  uniform  dis¬ 

tribution  of  records,  then  the  number  of  accession  numbers  is 

s"'N  *  *  .  Here  S  -  'A,,  A , ,  ....  A  for  all  i  and  S  -  the  emrt v 
J  1  2  s  ■  t 

set.  Ihtis  ,  from  ( 5  .  -J )  ,  the  redundancy  is  given  by 


,2(N-1)  ,  sN'l-l 

s  l  - 


N- 1 


I  3  .  3  I 


In  the  filing  scheme  discussed  in  example  4.1,  X  -  3  and 

s  =  3,  thus,  the  redundancy  cf  that  filing  scheme  is  2. '.’"8. 

Consider  tile  MVI-S,  with  parameters  *  s  ♦s>s*SjS*'*  ... 
N-2 

...  *s^.  jS  and  tlie  attributes  divided  into  N-l  groups 
(Cj,  ....  uN  j)  where  the  ilh  group  C,  contains  s  s '  1 

attributes  each  having  s-''"1  values,  i  =  1,2 . N-l.  It  has 

been  shown  in  theorem  4.2,  that  b  =  s'  ‘is  i(N-l,0,s) 

-  s  ;  ( N  -  2 , 0 ,  s )  ...  -  s  ;(l,0,s)  -  s  ].  When  the  records 

1  lN  -  L  IN  -  1 

are  uniformly  distributed  then  the  total  number  of  different 

V  s.s1*1(N-i) 
i '  1  1 

types  of  records  is  s  "  .  Here 


S . 
1 


(A 


1*  2’ 


s,  A. 's  from  G,,  s,  A. 's  from  G, , 
2  i  2  3  i  3 


. .  .  ,  s.,  ,  A .  ' s  from  G.,  . }  for  all  i  . 
N-l  i  N-l 
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S  =  fs_(s-l)  A. 's  from  C., ,  s.fs  - 1  j  A.’s  from  fl 
1  2  i  2  3  i 


3’  '  ‘ 


.  .s  ,(sS-2-l)  A.'s  from  G  ; 
- 1  i  N  - 1 


Substituting  in  (5.4)  the  redundancy  of  this  MY1S,  is  given 


»>»' 


R  =  sN"2  <s  ; (N-l ,0,s  j 


N- 1  ■  N-l  /.S-i  .  \*i 

T  tfN-i-i.o.s)}  / 1-  n  1  s 


i  =  l 


N-l  s. 


■  i  i  **> 

i  =  l  \  s 

15.6) 


l  *  ■ 


i  =  l  sN“ 1  - 1  j 


In  the  filing  scheme  discussed  in  example  4.2,  N  =3,  s  =  3, 


=  5,  n^  =  n,  =  9,  =  n^  =  n„  =  3,  Sj  =  2,  s,  <=  1,  b  =  90. 


The  number  of  different  types  of  records  in  that  file  under 
uniform  distribution  is  2187.  The  redundancy  of  that  filing 
scheme  is  7.037. 


6.  RETRIEVAL  TIME 

In  the  filing  schemes  discussed,  the  retrieval  time  is 
composed  of  two  components: 

(ij  Retrieval  of  the  accession  numbers  of  the  relevant 
documents ; 

(ii)  Retrieval  of  the  records  when  accession  numbers  arc  given. 
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I  ho  time  needed  to  r  these  two  components  is  denoted  hy  l| 
and  !,.  T  and  T,  depend  on  the  structure  of  the  file,  the 
lengths  of  the  records,  the  filing  scheme,  the  hardware  system 
and  what  operation  of  the  retrieval  procedure  is  carried  out  b\ 
which  component  of  the  computer  system.  It  will  he  assumed  that 
the  main  file  of  records  are  storej  in  a  random  access  disc  pack 
and  also  another  disc  pack  is  available  for  construction  of 
the  filing  scheme. 

The  attributes  and  the  values  which  they  can  assume  have 
a  representation  in  the  computer.  The  first  step  in  the 
retrieval  procedure  consists  of  converting  the  ui"-ry  representa¬ 
tion  to  its  point  representation.  This  may  be  ac  leved  by  a 
simple  table  look-up,  Let 

tj  =  time  needed  for  converting  a  quer>  to  its  point 
representation . 

The  points  are  used  to  determine  the  algebraic  equation  and, 
hence,  the  bucket  label  pertinent  to  the  query.  This  operation 
is  carried  out  in  the  central  processing  unit.  Let 

t.,  =  time  needed  to  solve  the  algebraic  equations  to 
determine  the  bucket  label. 

It  is  easy  to  see  that  the  coefficients  of  the  algebraic  equa¬ 
tions  (properly  concatenated)  of  all  the  lines  of  a  finite 
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geometry  form  sets  of  linearly  ordered  consecut i ve  elements. 

In  MVFS,,  although  all  the  lines  of  the  finite  geometry  clu  not 

C 

correspond  to  buckets,  the  coefficients  of  the  algebraic  equa¬ 
tions  of  the  lines,  which  correspond  to  buckets,  can  be  made  to 
correspond  to  a  set  of  linearly  ordered  consecutive  elements 
because  the  deleted  lines  have  a  systematic  pattern.  Ihus, 
the  buckets  can  be  made  to  correspond  with  consecutive  tracks 
on  a  disc.  Different  practical  situations  may  necessitate 
corresponding  one  bucket  to  more  than  one  track,  or  fraction 
of  a  track,  but  all  these  can  be  taken  care  of  by  simple  mathe¬ 
matical  transformation.  Thus,  in  the  retrieval  procedure,  the 
computer,  after  solving  athe  algebraic  equation,  can  give  a 
direct  command  to  the  magnetic  head  of  the  disc  for  the  exact 
track  location  of  the  bucket.  Let 

tj  -  time  needed  for  locating  the  bucket  to  seel,  rime 
of  ‘he  disc  pack . 

In  the  MVFS,  discussed  in  section  4,  each  bucket  contains 

i. 

s(s-l)/2  sub-buckets.  The  sub-buckets  will  correspond  to  small 
segments  on  a  track  with  sub-bucket  labels.  Thus,  the  computer 
can  give  a  direct  command  to  the  magnetic  head  to  search  a 
particular  sub-bucket  and  retrieve  all  the  accession  numbers  in 
it.  Let 

t^  =  time  needed  to  locate  a  sub-bucket  and  retrieve  the 
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accession  numbers  of  pertinent  query  search  time 
on  a  track. 

At  times,  sub-buckets  may  be  chained.  In  such  situations, 
the  chaining  links  can  he  picked  up  by  the  magnetic  head, 
ordered,  and  the  retrieval  procedure  completed  in  another  rota¬ 
tion  of  the  disc.  Let 


t.  -  time  needed  for  tracking  chaining,  if  required. 

Ihus,  Ij  -  c]*t,*t.*t4*t5  (6.1) 


T,  will  depend  upon  the  number  of  records  pertinent  to  a 
query  which  will  depend  on  the  distribution  of  the  records  in 
the  file.  If  the  distribution  of  the  records  is  uniform  and 
the  distribution  of  usage  of  queries  is  uniform,  then  an  express 
ion  for  the  upper  bound  of  average  7,  nay  be  obtained  as  follows 


Let 


«  the  average  seek  time  on  a  disc. 

*  the  average  search  time  on  a  disc. 

S.  =  (set  of  all  the  1  attributes  except  A. 

V2  >} 


and  A . 
J 


}. 

7 


The  average  number  of  records  satisfying  a  query 


n 

<1 


2 


c 


I.  i. 

jj  t  j2 


11 

i  e  S . 


}  1J2 


n . 


i 


(6.2) 


IV-28 


where  the  double  summation  is  over  all  if.-lJ/S  values 
of  j.  and  j,  from  1  to  where  jj  j  jr 

As  more  than  one  pertinent  record  may  be  on  the  same  track, 

hence 

Average  7  n  . 

i  q 

Thus,  the  average  retrieval  time  for  a  query  under  all  the 
stated  assumptions  is 

Tj  +  Average  T, . 

7.  MULTIPLE  VALUED  FILING  SCHEMES  AND  PAP.T! ALIA  DELETED  GEOMI  IPi 

Consider  an  FG(N,  s)  and  the  spread  Xj +a,x,+ . . . +a^xs  =  e 

where  t  GF(s)  and  are  fixed  and  c  varies  over  all  elements 

of  Gf(s).  Let  n^,  n^,  ....  n.  be  a  set  of  positive  integers 

N-  1 

with  i  •  s,  and  max  {n..1  *.  s' 

Consider  the  element  a^Xj  +  a +  ...  +  =  *j  (i"D,l,2. 

...s-I,  a.  s  GF{s))of  the  spread  and  delete  from  it  s,’i-n.  , 

l  l  +  l 

points.  Then  delete  all  lines  which  lie  completely  on  any  one 
of  the  elements  of  the  spread.  Some  lines  which  do  not  lie 
completely  on  an  element  of  the  spread  may  be  deleted  because 
some  of  the  points  have  been  deleted,  and  at  least  two  points 
are  needed  to  define  a  line.  There  are  some  lines  in  this 
geometry  which  may  have  less  than  s  points.  This  partially 
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s 

deleted  geometry  has  ;  n.  points.  The  number  of  lines  in 
l-i 

this  geometry  depends  on  the  n.'s  and  also  on  the  actual  points 
which  arc  retained,  i.e.,  for  a  fixed  set  of  n^'s  the  number 
of  lines  may  vary  depending  on  the  choice  of  the  points  which 
are  not  deleted  Such  properties  of  partially  deleted  geometries 
are  not  very  well  known  and  no  attempt  will  be  made  in  this  paper 
to  develop  such  theories;  but  some  interesting  results  which 
have  been  obtained  by  simulating  on  the  computer  will  be  stated 
later . 

Consider  a  file  with  s  attributes  and  the  ith  attribute 

has  n.,  (n.  0,  i  =  l,2,...,s)  distinct  values.  Associate 

11 

the  ith  attribute,  Ai ,  with  the  element  a^x^+a^x,. .  .♦a^x^  =  :i._1 
of  the  spread  in  the  partially  deleted  geometry.  The  n.  points 
on  this  element  are  associated  with  the  values  of  the  attri¬ 
bute  The  lines  of  this  partially  deleted  geometry  are 

associated  with  the  buckets  of  the  filing  scheme.  The  storing 
rule  for  the  accession  numbers  of  the  records  in  the  buckets  is 
the  same  as  before.  The  sub-bucket  may  be  constructed  in  a 
similar  manner  as  before.  The  exact  number  of  buckets  in  the 
filing  scheme  cannot  be  stated.  It  is  obvious  that  b  < 
and  it  is  difficult  to  state  a  lower  bound  for  b.  Given  a  set 
of  n^'s,  there  are  some  practical  situations  in  which  the 
minimum  number  of  buckets  are  preferred,  whereas  in  some  other 
situations,  maximum  number  of  buckets  are  preferred.  Some 
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simulations  were  performed  on  the  computer  and  some  interesting 
numerical  results  were  obtained  which  are  given  in  the  follow¬ 
ing  example. 

Example  7 .  I 

Consider  an  EC(3,S) .  This  geometry  has  125  points,  775 
lines  and  1SS  planes.  The  planes  can  he  divided  into  31  groups 
of  5  each,  where  each  group  represents  a  spread  of  the  geometry. 
A  partially  deleted  geometry  is  constructed  by  deleting  the 
following  38  points:  000,  002,  004,  010,  012,  014,  021,  022, 
023,  030,  034,  100,  102,  104,  110,  111,  113,  120,  122,  124, 

142,  231,  223,  300,  304,  323,  340,  344,  412,  42],  423,  424, 

430,  -.3^,  434,  441  ,  442,  443.  This  partially  deleted  geometrv 
has  150  lines  with  2  points  each,  253  lines  with  3  points  each, 
222  lines  with  4  points  each  and  ISO  lines  with  5  points  each, 
i.e.,  deletion  of  the  38  points  has  not  deleted  any  line 
completely.  This  has  been  possible  because  of  the  particular 
manner  in  which  the  38  points  were  chosen.  Consider  the  spread 
*1  xl  **  xi  =  Xj  =  3  and  Xj  -  4,  These  planes  now 

have  15,  15,  23,  20  and  IS  points  on  them.  These  5  planes  are 
now  identified  with  a  file  with  5  attributes,  3  of  which  have 
15  values  each,  1  has  20  values  and  the  other  has  23  values. 
There  are  150  lines  which  lie  on  these  5  planes.  They  are 
deleted  from  the  partially  deleted  geometry  and  the  remaining 
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025  lines  are  identified  with  the  buckets  of  a  filing  scheme. 
Among  these  o25  lines,  there  are  111)  line'-  with  -  points  in. 
each,  ’1?  lines  with  3  points  on  each .  i8<>  lines  with  •!  Points 
on  each  and  j  12  lilies  with  3  points  on  each.  Ihus,  in  the 
filing  scheme,  there  are  lit)  buckets  with  1  suf. -bucket  in  each, 
3  1"  buckets  with  3  sub-buckets  in  each.,  18b  buckets  with  ('  suh- 
huekets  in  each  and  Ilf  buckets  with  id  sub -buckets  in  each. 

This  partially  deleted  geometry  can  he  used  to  construct 
filing  schemes  for  another  20  different  types  of  files  by 
associating  them  with  different  spreads.  In  thesi  files,  the 
different  nuraner  of  values  that  the  attributes  cun  take,  l.e., 


■V 

•v 

n .. , 
a 

ll ,  , 

n8> 

a  re  : 

(  14 

, 

5 ,  lb ,  18 

■*  " 

fid, 

13  . 

1", 

IS, 

24) 

(14, 

10, 

18 

19,  21), 

(14,  17, 

18, 

18. 

21  ) 

f  1 :» . 

13  , 

1", 

Hi. 

22  ) 

(  1  •"> , 

lb  , 

16 

17,  24)  . 

( 15 ,  In, 

18. 

19. 

20) 

[13, 

lb. 

r, 

18, 

22) 

(15, 

17, 

17 

18,  21) , 

(13,  1“, 

18, 

19, 

19) 

113, 

IS, 

18, 

18, 

19! 

(lb. 

16, 

16 

18,  22), 

no,  in, 

1", 

18, 

21  i 

‘  lb, 

lb. 

17, 

If, 

20  I 

(  16. 

16, 

1° 

18,  20  i  , 

no,  if., 

IS, 

19. 

19  1 

Ilk, 

i  ? . 

17  , 

18, 

20  J 

(lb. 

17, 

IV 

19,  19), 

fib,  17, 

18, 

18, 

19  1 

and 

( 1?. 

17, 

18, 

is. 

IS)  . 

Thus,  the  following  theorem  is  established. 

Theorem  7.1 

There  exists  a  MVl'S^  with  parameters  t  =  2,  •  =  s,  n  j  ,  n  „  . 

j  ( \  - 1  j 

..  .  ,  n  ,  where  n.  .?  0  for  i  -  1,  2,  .  .  ,  .  and  b 
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Consider  a  file  which  has  the  following  type  of  structure: 


..j  attributes  with  values  between  s^  *  and  2+l 

j^;  2  N  3 

»-2  attributes  with  values  between  s'  ”  and  s  +1 
attributes  with  values  between  s^  ^  and  s^  ^  +  1 


(7.1) 


N-l 


attributes  with  values  between  s  and  0 


where  l  ,  «-2>  _ _  j  satisfy  the  following  conditions: 


(i)  .. . ’s  are  non-negative  integers: 


fii) 


r  ^ 

i 


1  !  S 

L  J 


!  s2 


;  *n-i 


*  i  in  i  -  s  • 

L s  J 


For  simplicity,  it  will  be  assumed  that  nj,  n^,...,  n 


N-l 


N-2 


en  s'  and  s'  *1,  n  ,,  n  ....  n 

,1*1  ..j*2  V*2 


1  ie 


.  N-2  .  N-3  ,  N-2  .  . 

between  s  and  s  *1,  ....  n  \^.+l,  . ...  n,  lie  between 

i  =  l 

s  and  0,  where  i  is  the  total  number  of  attributes. 

Consider  a  EG(N,  s)  and  a  (N-l)  dimensional  spread  in  it. 
Choose  .. j  elements  of  this  spread  and  let  them  be  represented 


by 


anxi  *  ai:x2  +  +  alNXN  =  ‘0 


aHxl  ♦  a12x2  +  ...  ♦  alsxs  = 


(7.2) 
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(7.2) 


anxi 


a  12x2 


*  aixx\ 


Consider  the  ( N  - 1 )  dimensional  flat  aj  ixi  +  aj  '+x->*  ... 

...  -a and  delete  from  it  s1''  *-nj  points  and  associate 
the  remaining  points  with  the  n^  values  of  Aj.  Similarly, 
delete  s^*'-n,  points  from  a^x^a^x,*  ...  ♦a  j^.x^.  =  ij  and 
associate  the  remaining  points  with  A7  and  so  on  up  to  A 

"1 

Let  [*i/i-l]  =  s..  Choose  another  set  of  s,  elements 
of  the  spread  and  partition  each  element  into  (N-2)  dimensional 
spreads.  These  s,s  -  elements  may  be  represented  as: 

*0  & 


allXl 


a12x2 


alNXN  = 


a21XI 


i  x  ♦ 
22  2 


a2NXN 


(7.3) 


where  a.  ’s  are  fixed  and  c_  varies  over  all  elements  of  GF(s). 

i.  2 


Choose  ,  elements  out  of  (7.3)  and  delete  all  the  points  which 
lie  on  the  remaining  s,s  -  (N-2)  dimensional  flats  of  (7.3). 
Take  one  of  these  remaining  elements  of  (7.3)  and  delete  from 


N-  2 

it  s‘  -  n  +j  points  and  associate  the  remaining  n;  points 

1  M  —  ^ 

with  the  n.  .  values  of  A  ..  Similarly,  delete  s  -  n  - 
'l  1  1 

points  from  another  element  of  (7.3)  and  associate  the  remain¬ 
ing  n  -  points  with  the  n  -  values  of  A  _  and  so  on  for 

V  1  1 
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the  attributes  A.  ,  A,  ,  .  This  process  is  continued 

*'l*'i  l  + '  2 

until  ail  the  attributes  have  been  associated  with  different 
dimensional  flats  of  the  geometry.  In  this  process  of  associa¬ 
tion,  a  partially  deleted  geometry  with  n  -  '  n.  points  is 

irl  1 

obtained.  From  this  partially  deleted  geometry  all  lines  which 
lie  completely  on  any  flat,  which  is  associated  with  an  attribute, 

is  deleted.  Let  us  denote  the  partially  deleted  geometry  by 

N 

Pt)G(.N,  s,  s  - n y .  The  exact  number  of  lines  in  rDG(N,  s,  s  -n] 
will  depend  on  the  s'^-n  points  which  have  been  deleted.  As 
this  deletion  can  be  done  in  many  ways,  the  number  of  lines  will 
vary,  but  will  be  less  than  the  expression  given  in  equation  (4.S) 
If  the  lines  of  this  PDf,(N,  s,  s''- n)  arc  associated  with 
the  buckets  of  a  filing  scheme  and  if  the  storing  rule  for  the 
accession  numbers  of  the  records  are  the  same  as  before,  then 
the  filing  scheme  will  correspond  to  a  MYFS^  for  a  file  whose 
structure  is  given  by  (7.1).  Thus,  the  following  theorem  is 
establ  ished . 

Theorem  7,2 

There  exists  a  MYFS^  with  parameters  t  =  2,  1  =  L 

,  "here  1 .  7  0  and  l.  attributes  have  values  between 
N-l  i  l 

sN  1  and  sN  1  ^+1,  n. ,  n  n.  where  n.  7  0  for  i  =  1,  d,..., 

■I  *  >-  1 

and  h  •'  s*’2  Is  c(N-l,0,s)  -  s  t(N-2,0,s)  -  ... 

.  .  .  -  s„  „  e(i,0,s)  -  s.,  ,}  where  s.  [ y-  i  /  „  i  - 1 1  and  s  is  a  power 

i\  -  l  N  -  i  1  13 
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of  a  prime  integer. 

!n  most  practical  situations  the  stueture  of  a  file  is  not 
stated  as  in  17,1)  hut  usually  in  the  following  form: 


attributes  with  Vj  values 


attributes  with  v,  values 


.  attributes  with  v  values 
m  ra 


(7.4.1 


where  i^’s  and  v^'s  are  non-negative  integers.  The  problem 
is  then  to  find  a  pair  of  s  and  N,  and  apply  theorem  7.2.  1-or 


simplicity,  it  is  assumed  that  Vj  ••  v. 


Then 


s  and  N  are  chosen  with  the  following  properties: 

(i)  s  is  a  power  of  a  prime  integer. 

(ii)  There  exist  m  pairs  of  integers  ( ? ^ ,  N) ,  (s^,  ) ,  f s ^ ,  N  ) 

....  (s  ,  N  )  which  satisfy  the  following  conditions: 
m  m  '  6 

«.j  ■  s,  is  the  smallest  integer  which  exceeds  v  ^ 

s/2  >  »’.?  and  s^'  ^2  is  the  smallest  integer  which  exceeds  v9 

.  £  and  ^3  ps  the  smallest  integer  which  exceeds  v. 


Nn  ,  ,  N-N„ 

s  s  m  ■<  i  and  s  n 
rr,  m 


is  the  smallest  integer  which  exceeds  v 


and  il  ♦  s2  *  Sj  *  ...  ‘  sm  =  s 
and  K.  <  N  for  i  *  2,  3,  ....  m. 
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Having  chosen  N  and  s  as  satisfying  the  above  conditions, 
it  is  easy  to  see  that  theorem  7.2  can  be  applied  to  construct 
the  MVFSj  for  the  file  structure  given  in  (7.4). 
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ABSTRACT:  This  section  presents  the  results  of  the  model  runs  versus  actual 
computer  runs  to  illustrate  the  kind  of  accuracy  attainable  with  the  model's 
equation  evaluation  approach.  The  average  error  for  the  most  complex  access 
method  tested  (ISAM)  under  the  conditions  selected  is  well  under  ten  percent. 
Generally  speaking,  the  difference  between  the  actual  file  layout  and  that  of 
the  model  accounts  for  many  of  the  errors. 

For  the  file  designer  who  dots  not  have  the  resources  to  run  the  full  FOREM  1 
model,  but  who  wishes  to  make  a  quantitative  evaluation  of  a  small  number  of 
primary  key  access  method  designs  ,  we  have  provided  printouts  of  the  relevant 
FORTRAN  subroutines. 
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FOR 01  I  (FSSM)  CALIBRATION 

One  of  the  major  considerations  with  any  model  is  its  accuracy.  This  is  a 
particularly  crucial  considerat ion  in  the  case  of  FOREM  1  which  uses  equations 
with  complex  assumptions  to  evaluate  the  elapsed  time  for  the  primary  key  access 
methods . 

In  calibrating  and  testing  FOREM  I,  we  have  selected  the  most  complex  access 
method  available  -  the  IBM  Indexed  Sequential  Access  Method  (ISAM)-  to  determine 
whether  accurate  equations  can  be  written.  The  measured  results  were  obtained 
by  John  Barlow  of  SDD  using  a  360/Mod  50.  Different  processors  will  affect  the 
results  in  some  cases.  As  the  tables  included  in  this  chapter  indicate,  the 
equation  evaluation  method  can  be  quite  accurate;  the  average  error  for  all 
experiments  is  8.3V  In  some  experiments,  the  actual  file  layout  for  the  over¬ 
flow  area  deviated  from  random;  in  these  cases  (which  bear  asterisks)  the  runs 
are  shorter  than  they  should  be.  Removing  these  cases  results  in  an  average 
deviation  of  6.7%. 

The  results  are  extremely  important  for  the  area  of  file  design  because  they 
prove  that  accurate  equations  can  indeed  be  written  for  complex  access  methods. 

This  means  that  another  avenue  of  estimating  gross  performance  is  open  to  the 
file  designer.  Even  if  he  does  not  have  a  computer iced  model  available,  he  can 
still  hand-calculate  timings  for  typical  queries  by  inserting  his  parameters  into 
the  model  equations.  In  order  to  make  such  calculations  possible,  we  have  included 
in  this  section  printouts  of  the  FOREM  I  FORTRAN  access  method  subroutines. 


TEST  ENVIRONMENT 

CPU  -  360/Mod  50 

I/O  Device  -  2314 

Access  Method  -  ISAM,  master  index  in  core,  other  indexes  and  overflow  records 
on  same  volume  as  prime  records. 

File  Size  -  100,000  records 

Record  Size  -  200  bytes,  including  8  bytes  key. 
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Blocking  -  Full  track  in  prime  a'ea,  unblocked  in  overflow  area.  (30  records/ 
prime  track,  20  records/overflow  track.) 

Records  Processed  -  5000 

CPU  Processing  Time  -  0 

NOTATIONS  USED  IN  TABLES 

Loading  -  creation  of  file. 

SR  -  sequential  retrieval. 

SSR  -  skip  sequential  retrieval. 

RR  -  random  retrieval. 

RU  ■  random  update. 

R1  -  random  insertion. 

SSI  *  skip  sequential  insertion. 

cyl  -  cylinder  overflow  (overflow  records  in  same  cylinder  as  prime  records), 

ind  -  independent  overflow  (overflow  records  in  different  cylinders  as 

prime  records ) . 


*ln  order  to  have  the  model  set-up  and  the  real  data  set-up  be  as  close 
as  possible,  the  0  percent  overflow  actually  has  a  very  small  number 
of  overflow  records. 

Ccont .) 
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**The  discrepancy  between  model  and  measured  results  is  mainly  due  to  the 
set-up  of  overflow  records  in  the  created  data  set.  The  overflow  records 
belonging  to  the  same  track,  for  example,  are  stored  very  close  to  each 
other  in  the  actual  set-up.  In  the  model,  each  pair  of  records  is  con¬ 
sidered  to  he  separated  by  half  as  many  cylinders  as  there  are  overflow 
cylinders . 


***The  error  in  this  case  is  due  to  the  assumption  that  missing  of  revolution 
occurs  when  control  is  returned  to  CPU  to  set  up  the  reading  of  the  track 
index  and  next  data  track.  In  the  particular  data  set  chosen,  no  missing 
occurs  prohably  because  there  is  a  considerable  amount  of  empty  space  at 
the  end  of  each  data  track. 
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|  |)  =  To+(  n.Hl.hl  (  1  h'.  )  /hiM)«l  X  MV  (  l  .  -  hi.  hi  IV  (  if'.)  )  /  (  hlKMH  (  1  h  I.  )  4  (-  L  H  I.  h  I  I  I  h  i_  ) 
1  )  )  4  U  1 1  • 

1  ( i  h  |  1 1  =  I  i  >  +  X  i'l  ••=  h  I.  4 1 )  V  (  I  f  I.  )  4  /  •  4  f  A  I  U 

If  (  .}>  I  .  Hit:  .  CC  )  (ill  I  i.l  144 

I  U=  I  '.)+  x  >w4  f  I.  4,  I  M  (  I  r  I.  )  4  |  A  I  I J 
i - 1  f  IH  144 

■MIS  I  1 1  =  |  1 1  +  (  3.4  X  i  j  4  (  1  .  -  h  L  h  O  V  (  If.)  I  /  I  f’.Kkttl  I  f  I.  )  4  h  L  h  I.  h  1  (  I  f  I.  )  I  -  1  .  )  4  I  A  I  U 

I  h  (  Oil.  Ml-  .  ZL  )  (4i  HI  10H 

1 0= I o+ (  1.  *X.\*  (  I  .-h'.hUV  (  I  h  L  )  l/lh'.Kkhl  1  hi.  )  4hi.HI.hT  (  I  h  I.  )  )-l  .  )  4  j  A  TO 
OO  III  in* 

V)  I  A  I  = 1  . 

*  I  4  C  hi  =  A  I  4  h  I.  f.  I  h '[  l  I  f  I.  ) 

|  h  (  (  {  (,«/rN  ) -f  LMIKlChl/h.M)  l.hO.O.  )  Ml  I  U  104 
A  I =A1 +1  . 

Ml  III  114 

.10  4  I  1  C  =  C  hi  /  h  •_  n  l.  h  I  I  I  h  L  > 

M  IC=Ci1/h'i 

Ih  I  MM.O  I  .  (  hl.Hl.hT  (  J  hi.  )/?.  )  I  00  TO  110 

I  l)  =  T  D-M  (  h  IC  +  ?*  I  1C-1  .  )4{  1./  I  1C  )*(  XN«(  1  .-HLHUVI  I  hi.  )  )  / 

1  (H.KhHl  I  hi.  >  *  hi.  HI.  h  I  (  I  h  I.  )  )  )-l  .  )v.|  A  10 

Oil  III  111 

3  If,  1 1)=  in+  (  (  H  1C  +  I  IC  )4  |  1  ./  I  ic  )’M  XIM4  (  1  .-hLhilVl  I  h  L  )  )  / 

1  (  ht.Khtti  1  hi. )  4rlbl.hl  (  I  hi.  )  )  )-l  .  )vl  AT  0 

ill  1  h  t  nil. i'll:  .CC  )  on  I  u  inn 

1 1 )  =  i  n  +  h  I C  4  (  1 .  /  i  IC  )4(  xim4(  l  ,-H.hOV  II  hi.  ) 

1  )  /  (  f  I.  H  h  n  (  1  h  L  )  *  h  i.  h  l.  h  T  I  I  h(L  )  >  )  4  |  A  T  t) 

(mi  In  lo h 
I'll  rM:|, 

I  U=  |  U+XN4  (  1  .-hlhnvi  I  hi  )  )4(  h.4Ct  1LI  1  .  /  hl.HLH  ri  I  hi.  )  1  +  1  .  )4  |  A  II) 

1  /  f  1  Khh  (  I  f  I.  ) 

I  h  (  U'l.fj.CC  )  fu=  I  O  +  Ct  II.  (  1 .  /hi  hi.  hi  (  If  I.  )  )*TaTIJ*XnM  1  . -f  1.  hUVl  I  hi  )  ) 

l  /  h  I.  K  H  H  (  1  h  I.  ) 

If  I  (  hi.  Hi.  I  (  I  f  L  )-f  LHlh  I  (  I  fL  )  •  .01  .o.  1  >  I’D  TU  11? 

|  |)=t  |  |  l+XiM4f  LhuV  I  I  hi.  )*/  .4 

1  C  h  1 1. 1  1  .  /  h  I.  H  i.  h  I  (  I  h  I.  )  )  v  T  A  I  O  +  X  im  4  h  L  H I J  V  (  I  h  L  ) 

I  4  |  A  1 1 1 

On  In  111 
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312  TU=TI)  +  XN*FLPUV<  I FL )*2 .*Ct I L I  1 . /FLHLHT M FL >  > *TA TU 

313  IF  <<jU.fcO.CCI  Tl)=TU+XO*FLP(JV<  IFL  )*CfclL<  l./FLHLPH  IFL  I  )*IA1U 

399  FK  =2 » 

GO  TU.299 

304  IF  (FLPUV<  IFLI.fcO.O.  )  GU  Ul  313 
IF  (FLAMt  IFO.kQ.  IK)  GO  TU  316 
rL)=TU  +  HATO*(ONfc< XN*FLNHP ( I  FL  ) /FL  TnR  <  I  t-L  )  )>  + 

1  CAT01*C,JN*(UNfclXN#FLNCP{ 1FL)/FLTNH(  IFL)  !  I 

X  =  FlNC!)<  IFL  )/UVCFtJi  lUfcV) 

Y=FL00R( FLNC0<  IFU/UVOKUDtV  )  ) 

IF  IX.GT.l.)  GO  10  317 

T0  =  Tl)-f  F  LOOK  I  FLUOR  I  1 FL i *XN ) *  t  1  .  - 1  . /FL  NCIJ  (  I  FL  )  )  *2  . 

1  ^  C  Y  L  (  (FLNCUI  IFL  )/?.),  IlJfcV) 

GO  10  3  1  H 

317  T0*T0  +  FLUUK(FLPUV< IFL  )*XN1*  <  1  .-1 . /FLNCU(  l FL  I) *2  . 

1  #(Y*CATf  +  <  CATDl  +  f  <X-Y)/2.  )*<CATI>i  -CAT01  )  ) «  <  X-Y)  ) 

31 W  I  F  <  I  t  F'.BLFTI  IFL  »  )  •  G1  .  1  .  )  .UK.  <  IFLHLTI  IFL  I-FLHI.fT  f  I  HI  )  .M  .0. 1  )  I 
1  GO  TO  319 

IF  (FLPOVUFD.Gfc.0.3)  GO  TO  320 
I  0  =  T  U+2 . * 

1  XN*0.5*FLP0V( IFL )*2.*I ATU 

GO  TO  -<21  • 

320  TD  =  TIJ+(2.*xn-1.  )  *0.3*  I  A  TO 
GO  TO  321 

319  I  F  (  IFLPOVI  IFL  )*FLRLF1  (  I  FL  I  vFI.RPB  (  1  FL)  XI  1 .-FLPUV  (  I  FL  >  )  >  .  M-  .  1  .  ) 

1  GU  TO  322 

T()*TL>  +  XN*FLP0V(  IFL  )*0.3*TA  1 1)*2. 

GO  TO  321 

322  TOsTU  +  XN*(  1 ,-FLFOVl  IFL) ) *0 . 3*T A TU*2 . / ( F LKPH I  I FL  )  *fL6L FT (  I  r L  )  ) 

GO  TO  321 

316  TD=TU  +  FATO*(ONfc(  XN*FLNHP(  IF).  )/FL  |NK(  IFL  )))  + 

1  C0N*CAT01*(  ONtI  Xi\*FI.ivCF(  I  FL  ) /FL 1  NR  <  1  FL')  )  ) 

IF  (  (  XN*FL  PU  V  (  I F  L )  ) *  G I . <  XN«  FL  TnT  P (  I  Fl_  )  /  FL  TNR  (  IFL)  )  )  GO  lu  323 
AC  =  X.\l*FLPOV<  IFL  ) 

W=0. 

GO  TU  324 

323  Af sXN*FLTNTP( IFL  )/FLTNR< IFL  ) 

W  =  XN#FLPOV(  IFL  I  -AC 

324  IF! (FLNCPI IFL ) ) .GT. (UVCPrtI IOfcV)-FLNCU< IFL ) ) )  GO  Tu  323 
TO*  TD-M  4.  *UNE  (  AC  )  -1 .  )*CYL ( <  FLnCPI  IFL  I+FL'mCOI  J  f L  i  )  /  2  .  ♦  I  0 F  V  ) 

GO  TO  326 

323  X=(FLNCP(  IFL)/(DVCPhI lOfcV )-FLNCO<  IFL)  )  ) 

Y  =  FLOOK<  FLNCPI IFL  )/<UVCFH<  IFL )-PLImCU<  IFL)  )  ) 

T0=Tt)+<4.*UNfc  <  Al-l,  )*<  l./X  I’M  Y*CATF+CA  TL)l+<  <  X- Y  )  /  2.  I*  (CATOL-CA  lull 
1  ) 

326  TO=T0+  W*  U.-l./FLNCUI  IFL  )  )*C  YL  (FLWCfJl  IFL  ) /2.  ,  lOtV  )*?. 

GO  TO  318 

400  FN*FLOOK( 80NU/2. » 

ATEMsXN*  FLNCPI IFL  )/FLTNM IFL  ) 

WRITE  (6,7111)  Alt" 

WRITE  16,7111)  T  1  ,T  t) 

IF  ( FN.GT.FlblPTl IFL ) )  FN=FL 8LP1 ( I FL ) 


lNhUU  =  <MlAUCi')/XN,H.\|*>-lKF8(  I  H.  )  ,  (  Xim/ PL  I  NK  (  I  P L  )  >  * 

1  Hl.TN  I  P(  IFL  )  ^PLHLPT  (  I  M.  )/Pi\j ) 

TN8T»J  =  IM)AL  (CN/XN,FL*pP  I  (  1  PI.  )  ,  (  Xiv/M.Tnk  (  I  PL  )  )«PLTNTP(  I  Pi.  )) 

WKllhr  (8*7111)  lNHUO,  I IMH  lO 
I  I-  (  FLAM{  I  PL  )  .iMk  .  1  P  )  (MI  Tu  404 
**  l*>  I  >>  =  I  U+HA  I  (J*(llNb(  XI\l*Pl.iM8P(  1  hi.  )/Pt.  TNK  (  IPL  )  )  ) 
l  +C  A  Tl)  1  *  (  (I  lAlt  (  XN*PI.NCP<  IPL  )/PLT.mk(  !pl  1  )  ' 

l  +((  T'MttTO/AltM  )  /Ct  IL  (  FLlNTPUPl.  >/  P  *.  NC  P  (  l  P  L  )  )  )  ) 

"Klfc  (8*7111)  I  I  *  Tu 
ip  (  plmuv(  ipl  )  *(;i  .o. )  gu  ru  430 
‘♦-■'1  II-  (PL8LF1  (  IPL  )  .Ul:.l  .  )  GU  Tu  408 
It-  (  PN.Gfc.cLHL^T  (  IPL  )  )  Gu  Tu  408 
l  ►*  (  (  (  PI.HLMH  I  l-L  )  /P<M  )  -H.Ut)K  (I-L8LF1  t  IfU/PNI) 

1  .wc.O.)  GU  Tl)  40  7 

WKlIt  (8,7111)  TltTU 

I  0=  T  0+  (  <  PLHLPT (  1PD/PN+1.  ) * ( XN« (  1.  -PL  MOV  (  I  pL  )  )  /  (  P'.KPH  (  IPL  )  * 

1  FLMLPT  (  IPL  )  )  >+p  M#Ti\irtUU/PLHl_Pl  (  IpL  )-l  .  )  *  T  A  T  L) 

ir.Kllt  (8.7111)  TI.Tu 
IP  (  OU.PU.CC  )  I  u  =  1  U+  I  "irtUO^T  A  I  U 
‘■He  10=1  (>■*>(  X  N  *  P  L  P  U  V  (  I  PL  )  •‘■CN’i'PLPUVt  IPL)  )  *  1  AT  L) 
wwITt  (8,7111 )  1 I.Tu 

IP  (  Ou.fcU.CC  )  Tl)=Tl)  +  C'MvpLPUV(  1  PL  )*  I  A  I  u 
Ml  In 

-UP  I  '<=  I  U+  (  (  ?  .  »XiV«  (  l  .-PLFIJV  (  I  pi.  )  ) 

l  /(pLKPril  IpLI^PLHLPTI  IpL)  )  -  l  .  ) 

1  +1  iilrt  I  u  )  *T  A  1  u 

I*-  (  UH.Pu.LC  >  1 1J  =  TU+  I  whT(J«  I  A  In 

(-.n  III  40M 

450  Ip  (PLPPPIH  IPL). (0.1.)  GU  lu  43? 

-x=rL*<Mrt(  I  p,L  )#PL»LP  I  (  I  p L  )  #p'.PUV<  I  PL  )  /  (  1  *-.-L PUV  (IPL)) 

(-U  lu  433 

<*1/  PFs  l  . /Pl.KPPul  1  PI.  ) 

4-t3  I  n=  Tii  +  C  a  l  U  l  vpK^Ci'jvpLFUV  (  I  PI.  )  *Tnh  I  u/ A  r  tiv>  / 

1  C-  1L  (  pLTimI  M(  IPL  )/  rl.iMCP(IPL)) 

(Ml  l  u  43  1 


407  AIM, 

414  C »*•  =  A T  #PI.3LPT  (  I  PL  ) 

IP  (  (  (  O/Pm  )-PLLMiK  (  Oi/PIM  )  )  .b'-'.O.  )  Ml  Til  409 
A  I  S  A  I  +  1  , 


GU  IO  4 l 4 

‘•uw  I  I  L  =0  4/PI.8LP  i  I  IPL  ) 
H  I  C  =  C  Pi  /  P  I'l 


pi  K  I  ft  (6,  Mil)  l  I  C  ,  pj  I  C 

IP  (  pm.GT.  (  PLBLF  I  (  IpL  ) /?*  )  )  GU  III  410 

w  K  I  T  p  (8,  rfl  l  )  I  I,  lu 


1 11=  1 n  +  (  (  rt  IC  +  2*T  IC  )*XIM*  (  1  .“PL PIJV  (  l  PL  )  >  /  (  T  I  OFI.KPH  (  I  PI.  )  «FL8LPT  (  I  PL  )  ) 
I  -1  .  )  *T  AT  1 1 VI  N8UU*5 1  A1  U/K  IC 

(.'I  I  n  4  1  1 


4io  U)=  in 
1 
l 

WK  I  T 


+  (  (  M  1 C  +  /? I  I  C -  1  .  >*XNv  (  1  .-PL  PUV  I  I  PL  )  ) 

/(  TIC*FLKPH<  IPL  )* 

PL8LPTI  IPL)  )-l  .+  TiMHUg*<  1  .  +  1./HIC)  )«>TATU 
(8,7111)  fl.Tl) 
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411  It-  IOU.tO.CC!  T  0  =  I U+  T  rMttUO**  I  A  T  U 
GO  rt)  4() 8 

40b  FN  =  1  • 

T()=T  l)+XN*(  1  •  -F  L  PO  V  (  I  FU  )  *  (  C  fc  I  L  I  1  . /FLBLPT  <  I  FL  )  )  +  1  .  )  *  TA  1  <)/ r  i.k  Pn  (  Ir'.l 
1  +  (  1  .-FLPUV<  I  FL  )  )*Ct  IL  (  1  . /Fl.iLFT  (  1  FL  )  )  #TA'I  u«* Tnhou 

WHllfc  (6,7111)  T I , TU 

IF  (OU.tO.CC)  1  L)=  T IJ  +  (  l.-FLPUV(  IFL)  )*CtlL(  I./fLfiI.PT  (  IFI.  >  I^TAlU* 

1  T  NHUQ 

IF(  (FLHI.  f(  IFL  )-FLHLPl  (  lt-L  >  )  .(,T.O.  1)  GU  1  tJ  412 
I  U=Tl)+  (  XN*F LPUV(  I  Ft.  )  *Ct  I  L  <  l./FLHLPT(  I  F|_  )  ) +Xi\i*FL  PUV  (  In.) 

1  +CN*FLPOV<  IFL  >  ■»C  rr  IL  (  1  ./FL6I.P1  (  IH  )  >  )*TA  I  U 

WK  I  I  F  (6,7111)  T  I  ,  (  0 
GO  TO  615 

412  TU  = 

1  TO-*-  (  XN+CN  )*FLPOV  (  I  FL  )  *C  t 1  L  (  1  .  / FL 61. P  T  <  I  FL  )  )  w  I  a  I O 

wkITF  (6,7111)  T  I  ,  1  L) 

415  IF  (OO.FO.CC)  TD  =  TIJ+CN*FLPUV(  I  FL  >*Ct  I  L  (  1  ./FLHLpT  (  I  FL  )  >  :sTAT  O 
GO  TlJ  294 

406  IF  ( FLPOV(  IFL  )  .bO.O.  )  GU  lU  4  1  :> 

IF  (  FLAm(  IFL  )  .t(».  ItJ  )  GU  10  416 
WKITF  (6,7111)  T  I  ,  TlJ 

T  0=1  U+BATU*  (U«Mt  (  XNS'FLNrtP  (  IFL  )  /  FI.  f  ok  (  1  r  L  )  )  )  + 

1  CAT01-K  (  UNt  (  XN*rLwCF(  I  FL  )/FL  INK  (  I  fL  )  )  ) 

X  =  FI.'mCO(  I  FL  )  /OVCPrtI  IDhV  ) 

Y=FLOOK(  FL'MCnl  I  FL  )  /  UVCFm  (  I  UfcV  )  ) 

IF  (  X.GT.  1.  >  GIJ  TO  4W 

1  l)=TU+  (  XN+CN  )WPLPIJV(  I  FL  >*(  1  .-1  ./FLfcCul  IFL  )  J*CYL  (  FI_imCu(  1  FL  >  /?.  .  I  UV  ) 

wk!  lb  (6,7111)  II,  I  U 
G0  III  61H 

4  17  1 p  =  TD+ ( Xn+CN ) *FLFOV(  IFL ) - ( 1 . - 1 . / F  L  NC  U (  IFL)  )  *(  1.7X) 

1  v(  Y*CAli)+( 

1  CA  rul  +  l X-Y ) v(CAl OL-CAT01  )/2.)*lx-Y)) 

416  JF  ( FLHLFl  I  IFL ) .GT. 1 . )  GO  10  202 
!F(  FLHLFT  <  IFL  )  .tO.  1  .  )  GO  Tu  4bO 

4b 1  IF  (FLFUV(  IFL ) .GT.O.b  ,  Go  10  440 

TU»T0+2.*(XN  +  CN)*FL**UV(  IFL  ) *0.  b* T  A T I) 

WK lit  (6,7111)  T  I  ,10 
GO  TO  431 

4b0  II-  (FL«PB(  IFL  )  .to.i.  )  GU  lu  202 
GU  TO  4b 1 

440  1  U  =  1  0+  (  XN  +  CN- 1.  )*0.b*TATl)*2. 

GO  TO  431 

4  19  IF  (  (  FLPUV(  IFL)*FLHLPl{lFL)*FLKPrt(  I  FL  )  /  (  1  ,-FI.POV  (  I  FL  )  )  I.Gt.l.) 

1  GIJ  TO  44  1 

1  U=TU  +  XN#FLMUV(  I  Fl_  )  *0. 3*1  A  I  I)+C’M*f  L  PUV  (  I  F  L  )  Is 0  .  b  *  T  A  I  0 
WkITF  (6,7111)  TI,TU 
GO  TO  431 

44  1  T0  =  T0+XN6(  1 .-FLPOVI  1  FL  )  )vU.b#TATU/(FLKPH<  I  fL  )  ^Fl.Hi.P  1  (  IFL  I  ) 

1  +  T  NB  T  0*0 . b*T  A I0«( 1 ./FLHLMT (  IFL )  ) 

GO  TO  431 

4  16  TIJ=Tl>  +  rtA  I  U*  (  ONF  I  XN’KFLNrtPI  I  FL  )  /  FL  I  NK  (  I  FL  )  )  ) 

IF  ( FLNCH( IFL ) .GT . ( OVCFHl IOtV l-FLNCU( IFL > ) )  GO  IO  442 
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PCft  «  =C  YL  (  I  PL'mCX  1  M_  )  +  hl.i\iCU(  I  PL  )  )/  2  .  ,  IOfcV  ) 

OCA  I  =C  Y!.  (  H.tYClJUt-L)  /?•  ,  IlJtfV  ) 

WK  I  I  p  (  6.  /I  ll  )  PCA  I  ,UCA  1 
Ml  i(J  4i,  -i 

X  =  pL'\iCP  (  I  PL  }  /  (  UVCP*  (  1 1  jr.  V  )  -M.imCLH  IrU'  l 
Y  =  i-UI'H  (  X  ) 

"*  K  1  I  P  (6,7111)  X  «  Y 

pC  a  i  =  (  i .  /  x  ;  *=  (  v?Cu  i  u  +  i 

l  lie  1 1;  1  +  (  (  X-Y  >  H.  >  *  (  c  A1 UL-CA  I  in  )  )  *  (  X-Y  )  ) 

"i.A  I  =  (  1  .  /X  )  V  <  Y  .'l  VL  (  PLWCUI  [  r )  /  2  .  t  1  I  >  V  )  +  (  X-Y  ) 

1  ~C  YL  (  (  X-Y  )  L-JCOI  1  PL  )/?•  ,  lUV  )  ) 

wK  |  I  r  (  6 ,  /  1  1  1  t  *CA  I  «  till  A  I 

•***“>  6  A  I M  I  H.  )  v-t.ttlt'  I  (  1  r  I.  )  *PLPHV I  IH.  )/(  1  .-hi  PI  IV  (  i  t-  L  )  ) 

I  P  (  kAi)  I  .  !*p  .  1  .  )  Ml  ru  999 

Mt  k  I  I  t  (  6  »  7  I  1  l  )  I  I  «  1 1 1 

I  .;=  i  i,  +  2.  v  (  Xix  +  O  )  -■•■•i-i.PIjV  (  I  PL  )-PC  AT+0  A  T  L)l*  (  OMfc  I  XiV*PL'M(',P  (  I  PL  )  / 

1  rLl  i'iK  (  I  PL  )  )  ••(  1  .-I-i.Xhh  (  I  PL  )*PI_rtLP  I  (  !  PL  )  *PLPOV  (  1  PL  )  / 

1  J  1  .  -p  LP'IV  (  I  PL  )  >  )  +  I  iMhUM*  (  PiV/U.Vt  (  PLHLP  I  (  I  PL  )  )  «2  .  *  (  1  .  / 

l  ( -IV  I  pC  I  OV  )-:  .  >  )  )) 

'-•■<11-  (6,7)11)  I  I  ,  I . , 

M  I  I  I  I  9  J  M 

99  9  I  !  =  I  <t  +  (  (  X'V*2  *  vH.  1 14 1 p  <  I  PL  )  /  PL  f  NK  (  I  PL  )  )  ♦Tin* TO* 
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ABSTRACT:  The  advent  of  modern  data  management  systems  has  raised  the  need  for 
models  of  such  systems,  and  for  computer  programs  which  embody  these  models  for 
Simulation  purposes.  Typically,  transactions  on  such  systems  result  in  complex 
patterns  of  accesses  to  direct  access  storage  devices.  These  access  patterns 
are  dependent  on  several  characteristics  of  the  data  management  system,  among 
which  are: 

(1)  The  contents  of  the  data  base; 

(2)  The  organization  and  accessibility; 

(3)  The  nature  of  the  request . 

Furthermore,  once  the  sequence  of  requests  is  determined,  the  efficiency  of  the 
system  in  satisfying  these  requests  is  most  dependent  (or  potentially  so)  on  the 
hardware  configuration  itself.  Hence,  it  is  desirable  to  develop  models  which 
reflect  these  dependencies.  We  think  we  have  taken  a  step  in  that  direction. 


I.  INTRODUCTION 


The  advent  of  modern  data  management  systems  has  raised  the  need  for  models  of 
such  systems,  and  for  computer  programs  which  embody  these  models  for  simulation 
purposes.  Typically,  transactions  on  such  systems  result  in  complex  patterns  of 
accesses  to  direct  access  storage  devices.  These  access  patterns  are  dependent 
on  several  characteristics  of  the  data  management  system,  among  which  are: 

(1)  The  contents  of  the  data  base; 

(2)  The  organization  and  accessibility; 

(2)  The  nature  of  the  request. 

Furthermore,  once  the  sequence  of  requests  is  determined,  the  efficiency  of  the 
system  in  satisfying  these  requests  is  most  dependent  (or  potentially  so)  or,  the 
hardware  configuration  itself.  Hence,  it  is  desirable  to  develop  models  which 
reflect  these  dependencies.  Ke  think  we  have  taken  a  step  in  that  direction. 

The  purpose  of  this  section  is  to  describe  PHASE  11,  a  model  of  data  management 
systems  which  has  been  implemented  as  a  set  of  computer  programs.  Some  of  the 
ideas  for  PRASE  II  evolved  from  experiences  related  to  the  development  of  an 
earlier  model,  FOREM  I,  described  in  (1).  The  implementation  of  FOR  EM  1, 
which  used  analytic  techniques  and  was  very  fast,  was  found,  on  the  other  hand, 
to  be  deficient  in  several  respects: 

(1)  Some  configurations  of  data,  hardware,  and  access  methods  defied  analysis; 

(2)  The  introduction  of  a  new  parameter  increased  the  complexity  of  the  result¬ 
ing  analysis  manyfold; 

(3)  The  analytic  programs  were  difficult  to  debug  and  verify; 

(4)  It  was  impossible  to  simulate  simultaneous  1/0  operations. 
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Therefore,  we  decided  that  a  program  more  closely  mirroring  activity  on  computer 
systems  in  general,  and  data  management  systems  in  particular,  should  be  developed, 
with  the  effect  of  sacrificing  run  time  efficiency  for  flexibility,  generality, 
and  ease  of  development,  modification,  and  generalization.  Insofar  as  the  FOREM  I 
programs  are  valid,  however,  they  can  be  adapted  for  use  in  the  PHASE  II  system. 

Progi  ">  specifications  and  design  are  not  stressed  in  this  paper  because  they  are 
not  complete  or  general  and  might  tend  to  obscure  the  model  itself.  Specifications 
of  how  to  use  the  modeling  programs  and  details  of  program  design  can  be  found  in 
section  6  (the  Phase  II  User  Guide). 

II.  OBJECTIVES  OF  THE  MODELING  EFFORT 


The  PHASE  II  modeling  program  was  designed  with  several  objectives  in  mind: 

(1)  To  provide  a  means  whereby  data  bases  with  known  characteristics  and  trans¬ 
action  sets  and/or  file  activity  profiles  can  be  run  against  variations  in 
hardware  configuration,  physical  arrangement  of  data  on  devices,  data  set 
organization,  and  accessing  strategy. 

(2)  To  provide  a  means  whereby  general  studies  can  be  made  for  issuing  guide¬ 
lines  and  trade-off  curves  for  data  base  and  retrieval  system  design;  to 
search  out  relationships  between  the  characteristics  of  a  data  management 
systems  environment;  and  to  identify  the  most  important  characteristics  of 
a  given  subset  of  characteristics,  that  is,  those  to  which  the  performance 
of  the  system  is  most  sensitive. 

(3)  To  provide  diagnosis  of  and  possible  improvements  to  existing  systems  by 
examining  resource  utilization  statistics  for  I/O  bottlenecks. 

(4)  To  allow  a  modeler  desiring  to  do  (1),  (2)  or  (3)  to  characterize  a  data 
management  system  environment  at  the  required  level  of  detail  for  those 
aspects  of  the  system  under  scrutiny.  The  modeler  will,  in  turn,  be 
furnished  by  the  modeling  programs  with  the  required  statistics  for  evalu¬ 
ating  the  simulated  system. 
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in.  modeling  data  management  systems 


It  is  useful  to  think  about  a  model  of  a  system  in  terms  of  two  major  aspects 
of  the  model: 


(A)  The  static  model ,  which  is  a  description  of  the  logical  and  physical 

configurations  of  the  elements  involved,  and  a  stimulus  to  be  applied  to 
the  model . 


(B)  The  dynamic  model,  which  is  a  description  of  how  the  configuration  changes 
when  a  given  stimulus  is  applied,  and  how  long  it  takes. 

These  two  submodels  have  direct  analogues  in  the  programs  that  implement  the 
model,  typically  assuming"  the  form  of  program  tables  to  implement  the  static 
model,  and  executable  program  statements  to  implement  the  dynamic  model. 

For  data  management  systems,  the  static  model  can  be  thought  of  as  having  four 
major  submodels: 


(1) 

(2) 

(3) 

(4) 

The  dynamic 

(5) 

C6) 

(7) 


Logical  description  of  the  data; 

Hardware  configuration; 

The  mapping  of  the  data  base  onto  hardware  devices; 

A  description  of  the  transaction  to  be  performed, 
model  has  three  major  submodels: 

Identification  of  those  data  elements  which  need  to  be  accessed  to 
complete  the  transaction; 

Locating  those  elements  on  the  hardware  devices; 

Accessing  them  in  the  desired  order. 
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There  exist  simple,  general  models  of  (1)  and  (2).  However,  (3)  is  difficult 
to  characterise  at  once  succinctly  and  with  generality;  as  many  schemes  for 
providing  a  data-dcvice  map  exist  as  there  are  data  management  systems.  Now, 

(-1)  depends  on  Ill.  (5)  depends  on  (4),  (6)  depends  on  (a)  and  (3),  and  f~) 
depends  on  (4) . 

Because  of  these  complex  interdependencies,  and  an  inability  to  characterize 
succinctly  some  of  the  submodels,  two  restrictions  have  been  placed  on  the  kind 
of  system  to  be  described  by  the  PRASE  II  model: 

(11  The  data  must  be  describable  in  terms  of  a  hierarchical  structure; 

(2)  The  data  must  be  conventionally  stored;  that  is,  related  fields  are  stored 
together  in  "records,"  and  all  records  of  the  same  type  are,  in  some  sense, 
stored  together. 

These  restrictions  are  somewhat  vaguely  stated  here  with  the  intent  of  conveying 
the  modest  design  objectives  of  the  model.  Their  exact  meaning  is  specified  in 
the  sections  to  follow,  which  define  the  model  in  more  precise  detail. 

A  subset  of  this  model  may,  of  course,  be  used  to  model  systems  which  do  not  "fit" 
the  PHASE  II  model,  but  at  a  level  more  primitive  than  that  implied  by  the  above 
discussion.  For  example,  if  a  transaction  set  for  a  system  can  be  characterized 
by  a  sequence  of  accesses  to  well-defined  locations  on  the  hardware  devices,  thus 
bypassing  the  logical  data  description  and  data-device  map,  the  above  restric¬ 
tions  would  not  affect  the  applicability  of  this  model  to  the  system. 

IV.  THE  PHASE  II  MODEL 

The  PHASE  II  model  allows  one  to  characterize  and  simulate  a  data  management  system 
with  respect  to  eight  aspects  of  such  a  system: 

(1)  Data  field  contents; 

(2)  Logical  structure  of  the  data; 
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(3)  Physical  organization  of  the  data; 

(4)  bata  selection  criteria; 

(5)  Data  accessing  methods; 

(6 j  Accessing  strategy; 

(7)  Hardware; 

(8)  I/O  supervisor. 

In  the  following  eight  sections,  we  will  discuss  each  of  these  aspects  and  indicate 
how  they  are  characterized. 

1.  Data  field  Contents 

A  data  field  is  usually  thought  of  as  an  item  of  information  about  a  particular 
entity;  for  example,  a  person's  name,  a  company's  assets,  etc.  A  data  field's 
contents  can  be  characterized  in  the  form  of  a  density  distribution  of  its 
values  over  all  occurrences  of  the  field  in  the  data  base.  Any  given  value  that 
the  field  can  take  on  is  assumed  to  be  uniformly  distributed  throughout  all 
occurrences  of  the  field.  The  one  exception  to  this  is  for  a  "sort"  field,  in 
which  case  the  order  in  which  the  values  are  presented  in  the  distribution  is 
the  order  in  which  they  will  appear  in  the  data  set  (to  be  defined  later)  con¬ 
taining  the  field. 

fields  are  assumed  to  be  statistically  independent  of  each  other  and  of  other 
system  parameters  (except  for  sort  fields),  hence,  data  bases  involving  fields 
with  significant  correlational  effects  will  require  careful  treatment,  perhaps 
in  some  cases  by  lumping  correlated  fields  together  and  treating  them  as  a 
single  field. 

2.  logical  Structure  of  the  Data 

The  logical  structure  of  a  data  base  imposes  a  relational  structure  on  the  fields 
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of  tho  J .i t :i  base,  and  can  bo  thought  of  as  a  "user's  view"  of  t he  data  base,  as 
opposed  to  tho  "system  programmer" s  view"  of  tho  data  base.  Such  a  structure 
has  an  existence  independent  of  any  associations  of  the  data  with  a  specific 
data  management  system  used  to  store  and  access  the  data.  \s  was  stated 
previously,  we  confine  ourselves  to  hierarchical  data  structures  described  as 
foil cw  s : 

Data  fields  are  organized  into  groups  of  related  fields,  or  "segments."  A  seg¬ 
ment  may  have  sets  of  inferior  segments  related  tn  it,  thus  inducing  a  segment 
hierarchy,  or  tree  structure,  on  the  data. 

lor  example,  a  personnel  file  may  contain  information  having  the  structure 
depicted  in  figure  1.  This  structure  consists  of  two  hierarchical  levels, 
bevel  0  contains  nonrepeating  information  about  an  employee,  and  level  1  contains 
two  types  of  segments  with  recurrent  information;  namely,  a  list  of  positions 


Employee  segment 
(level  0  or  master) 


Job  history  segments 
(level  1) 


NAME 

.  i  _ . DUMBER 

1 

ADDRESS.  ..-j 

;  DATE 

TITLE  |  DEPT 

SALARY  | 

DATE 

TITLE  i  DEPT 

• 

• 

• 

SALARY 

i  DATE 

!  TITLE  |  DEPT  5 

SALARY  j 

Publications  segments 
(level  1) 


!  DATE 

PUBLICATION  i 

TITLE 

!  DATE 

.  j 

PUBLICATION  j 

TITLE 

* 

• 

• 

1  DATE 

PUBLICATION 

TITLE 

FIGURE  1 

Hierarchical  Data  Structure 
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the  employee  has  held  with  the  company,  and  a  list  of  his  publications.  The 
latter  two  will,  of  course,  occur  different  numbers  of  times  for  different 
employees . 

A  general  structure  of  this  type  will  allow  subordination  to  any  level,  and  as 
many  different  segment  types  at  each  level  as  the  application  requires.  The 
one  restriction  is  that  a  strict  tree  structure  must  be  maintained,  that  is,  a 
segment  type  may  occur  at  only  one  level.  A  given  instance  of  such  a  structure 
(in  this  example,  all  the  information  about  a  particular  employee)  is  called 
a  "logical  record,"  and  the  collection  of  all  such  records  (the  personnel  file) 
is  called  a  "logical  file."  The  data  base  may  contain  several  logical  files. 

3.  Physical  Organization  of  the  Data 

The  physical  organization  of  the  data  is  a  specification  of  how  the  logical  files 
are  to  be  stored  on  physical  devices,  This  logical -to-physical  mapping  is  carried 
out  in  three  steps: 

(a)  Partitioning  of  logical  records  into  "logical  subrecords"  to 
form  "data  sets." 

(b)  Assignment  of  each  data  set  to  one  or  more  "elementary  files", 

(c)  Partitioning  of  elementary  files  into  "extents"  on  hardware  devices. 

In  the  first  step,  each  segment  type  (that  is,  all  of  its  occurrences)  is 
assigned  to  a  unique  data  set,  which  is  defined  as  a  collection  of  records 
numbered  from  1  to  N.  Figure  2  demonstrates  such  a  partitioning  for  the 
personnel  file  example  cited  in  the  previous  section.  The  "employee  segment" 
and  associated  "job  history:  segments  are  assigned  to  data  set  1 ,  on  a  one 
record  per  employee  basis,  as  illustrated  in  Figure  3.  All  "publication" 
segments  are  assigned  to  data  set  2  on  a  one  record  per  publication  basis. 


EMPLOY!.  H 


JOB  HISTORY 


I  I 

t  -  .  L _ 1  _ 1 


StlBRlCOKH  1 


PUBLICATI  OSS 


I;  I  CURE  2 

Partitioning  of  Logical  Records  into  Logical  Subrecords 


DATA  SI:  I  1 


F-'iriuvth  i 


Job 

Hi  storv 


EMPLOYEE 
■  lob 

Ili  storv 


(T 


>  ficcord 


l  Record 


DATA  ShT  d 


I’lmi.ll.'ATDSN  } 


PUB!.  H.' AT  1ST.  2 


etc  . 


f  DATA  SET  RECORD  1 
DATA  SET  RECORD  2 
etc  . 


-t 


I- 1  CURE  3 

Assignment  of  Logical  Subrecords  to  Data  Set  Records 
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There  are  two  restrictions  on  the  wav  in  which  the  above  partit ioning  can  be 
carried  out : 

(1)  If  two  segment  types  have  been  assigned  to  a  data  set,  thev  must  have 
a  common  ancestor  which  is  also  assigned  to  the  data  set. 

(2)  If  two  segments  which  are  lineally  related  are  assigned  to  the  same  data 
set,  segment  types  occurring  between  them  in  the  segment  hierarchy  must 
also  be  assigned  to  that  data  set. 

The  serial ication  of  the  records  of  a  data  set  constitutes  an  important  inter¬ 
face  between  the  data  accessing  methods  and  the  data  accessing  strategy  in  That 
the  record  number  is  the  primary  means  of  referring  to  information  for  retrieval. 
The  serial  number  also  provides  the  ordering  which  forms  the  basis  for  the 
notion  of  sort' field. 

At  this  point,  we  still  are  dealing  with  abstract,  or  logical  entities,  namely 
logical  records,  logical  subrecords,  cuta  sets,  and  records.  A  data  set  is 
mapped  cr.to  physical  devices  by  means  of  building  blocks  called  "elementary 
files."  How  many  elementary  files  are  needed  to  represent  a  data  set,  and  what 
their  contents  are,  depends  on  how  the  data  in  that  data  set  is  to  be  accessed. 
For  instance,  if  the  records  of  a  data  set  are  stored  and  accessed  sequentially, 
only  one  elementary  file  is  required.  On  the  other  hand,  if  the  records  are 
to  be  accessed  "randomly"  by  means  of  indexes,  the  data  itself,  the  indexes,  and 
other  auxiliary  files  would  each  constitute  an  elementary  file. 

By  definition,  an  elementary  file  is  a  collection  of  information  recorded  on 
storage  devices,  with  the  following  properties: 

Cl)  It  may  reside  on  one  or  more  devices  of  the  same  type  (that  is,  having  the 
same  physical  characteristics),  and  occupy  different  numbers  of  cylinders 
on  each  device. 

(2)  It  occupies  the  same  number  of  tracks  on  each  cylinder,  except  possibly 
the  last  cylinder  occupied  by  the  file. 
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(3)  Physical  recurd  format  (that  is,  record  size  and  blocking  characteristics) 
is  the  same  throughout  the  file, 

K»ch  elementary  file  is,  in  turn,  partitioned  into  "extents"  and  so  mapped  onto 
physical  devices.  I-.ach  extent  is  characterized  by  naming  the  device  on  which 
the  extent  resides,  the  first  cylinder  to  be  occupied,  and  the  number  of  con¬ 
secutive  cylinders  occupied. 

The  elementary  file  characteri zat ion  is  the  main  instrument  used  in  locating  a 
given  record  of  a  data  set  (and  auxiliary  information,  such  as  index  records). 

4.  Data  -Selection  Criteria 

Data  selection  criteria  are  roughly  equivalent  to  what  are  commonly  referred  to 
as  "queries,"  in  the  sense  that  a  query  usually  specifies  a  set  of  characteris¬ 
tics  which  a  logical  record  (or  subrecord)  must  satisfy  in  order  for  it  to 
qualify  for  retrieval  or  further  action. 

For  example,  (going  back  to  the  personnel  file  example)  one  may  wish  to  retrieve 
personnel  records  of  all  persons  who  have  taken  positions  in  Dept.  25  since  1965. 
We  shall  assume  that  there  are,  on  the  average,  three  job  history  segments  per 
master  segment,  that  50%  of  them  have  dates  later  than  1965,  and  that  20%  of 
them  have  DEPT  =  25.  (The  above  information  would  be  available  from  the  field 
and  logical  structure  specifications.)  Assuming  field  independence,  10%  of 
the  job  history  segments  fully  qualify.  Further  assuming  that  the  segments 
that  thus  qualify  are  uniformly  distributed  throughout  the  file,  the  fraction 
of  people  qualifying  (that  is,  those  with  one  or  more  qualifying  job  history 
segments)  is: 

1  -  (fraction  of  records  with  no  qualifying  job  history  segments) 

=  1  -  (probability  that  a  job  segment  does  not  qualify)3 
=  1  -  (1  -  .10)3  =  .27 
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This  kind  of  calculation  can  be  extended  to  hierarchies  of  arbitrary  depth  and 
complexity;  however,  the  modeler  should  give  careful  consideration  to  the 
assumptions  involved. 


A  hierarchical  model  of  a  data  structure  introduces  a  semantic  problem  into  the 
query  specification  in  that,  to  avoid  ambiguity,  a  more  complicated  selection 
specification  is  required  than  would  be  required  for  nonhierarchica]  data,  i'll i 
can  best  be  demonstrated  by  an  example. 


Consider  a  hierarchy  consisting  of  two  segment  types:  superior  segme’nr  A  and 
inferior  segment  B.  Each  B  segment  has  fields  x  and  y.  Such  a  hierarchy  is 
depicted  in  Figure  4, 


(c) 


f-  — 


FIGURE  4 


A  Logical  Data  Structure  and  Specific  Instances 
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Consider  also  specific  instances  of  this  structure  (a),  (bj  and  fc};  also 
depicted  in  l  igire  4,  and  the  following  query: 

"find  all  segments  A  such  that  a  =  1  and  y  =  d." 

This  query,  as  it  stands,  might  have  several  interpretations,  as  supplied  by  the 
following  appendages  to  the  query: 

(1)  "co-occurring  in  a  subordinate  segment  B  at  least  once." 

(2 J  "anywhere,  not  necessarily  co-occurring." 

(3)  "for  all  11  segments  inferior  to  A." 

Uf  the  specific  instances  (a),  (b)  and  fc),  the  qualifying  instances  for  the 
three  interpretations  are: 

1.  (a)  and  (b) 

2.  (a),  (b),  and  (c) 

3 .  (a) 

Therefore,  it  is  clear  that  what  is  needed  is  a  more  powerful  characterization 
of  a  query  (or  qualification  specification)  Than  can  be  supplied  by  a  simple 
Boolean  expression.  Such  a  characterization  can  take  the  form  of  a  statement 
with  the  following  form: 

"SEG  qualifies  by  criterion  LBL  if  it  has  QUANT  related  ELEMENTS 
that  satisfy  QUAL" 

where  the  capitalized  elements  are  defined  as  follows: 

LBL  -  an  arbitrarily  assigned  qualification  name  or  label 


y  I  - 1 5 


name  of  a  segment 


Shi; 

ELEMENT  -  a  field  or  segment.  An  element  is  "related"  to  SET  if  it 
is  SEG  itself,  a  descendent  segment  of  SEG,  an  ancestor 
segment  of  SEG,  or  a  field  in  any  of  these  segments. 

QUANT  -  a  quantifier  on  ELEMENT 


QUAL  -  a  qualification  criterion  on  ELEMENT.  If  ELEMENT  is  a  field 


nt.me,  QUAL 

will  specify 

a  subset  of  the 

range  of  the  field 

If  ELEMENT 

is  a  segment 

name,  QUAL  will 

be  a  reference  to 

qualification  label  of  a 

qualification 

statement  on  that 

segment,  or 

some  Boolean 

combination  thereof. 

LBL 

SEG 

QUANT 

ELEMENT 

QUAL 

Q 

B 

X 

=  1 

R 

B 

y 

...  "J 

S 

A 

any 

X 

=  1 

T 

A 

any 

V 

=  2 

U 

A 

any 

B 

Q  and  R 

V 

A 

any 

A 

S  and  T 

w 

A 

all 

B 

Q  and  R 

FIGURE  5 

Resolution 

of  the  Ambiguity  Problem 

In  Figure  5,  queries  (1),  (2),  and  (3)  have  been  expressed  unambiguously  by 
qualification  statements  U,  V,  and  W,  respectively. 

Following  are  some  examples  of  how  the  statements  in  Figure  S  are  interpreted: 

Q:  "A  B  segment  qualifies  by  criterion  Q  if  its  x  field  equals  1." 

(The  quantifier  is  not  necessary,  since  each  B  segment  has  exactly 
one  x  field.) 
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U:  "An  A  segment  qualifies  by  criterion  U  if  any  of  its  B  segments  are 

qualified  by  both  criteria  Q  and  k." 

K:  "An  A  segment  qualifies  by  criterion  K  if  all  of  its  B  segments  are 

qualified  by  both  criteria  Q  and  R." 

A  transaction  set  on  a  particular  data  base  may  consist  of  many  thousands  of 
queries  and  updates.  Such  a  set  can  be  characterized  by  partitioning  the 
transactions  into  subsets  of  transactions  whose  form  is  the  same  within  subsets, 
but  whose  field  value  qualifiers  change  from  transaction  to  transaction.  Hence, 
in  addition  to  the  form  of  the  query,  the  modeler  would  need  to  supply  for 
each  queried  field  a  distribution  from  which  field  values  to  be  queried  on  are 
to  be  selected.  This  again  makes  certain  assumptions  about  statistical  inde¬ 
pendence  which  may  or  may  not  be  well-founded  in  specific  instances.  Once  the 
characterization  of  the  transaction  sets  is  made,  field  values  can  be  selected 
at  random  from  the  distributions,  the  transaction  so  defined  can  be  simulated, 
and  this  process  can  be  repeated  as  many  times  as  is  required  by  the  modeler. 

Each  qualification  statement  defines  a  list  of  qualifying  records  of  the  data 
set  in  which  the  qualified  segment  appears.  How  this  list  is  used  in  character¬ 
izing  the  accessing  of  records  is  described  in  the  next  two  sections. 

5.  Data  Accessing  Methods 

Once  the  modeler  has  defined  the  elementary  files  of  a  data  set,  he  then  needs 
to  specify  how  a  given  record  is  to  be  accessed  in  response  to  a  request.  That 
is,  he  must  specify  the  sequence  of  accesses  to  the  elementary  files  of  the 
data  set  which  ultimately  result  in  the  retrieval  of  a  requested  record.  Of 
course,  this  dynamic  aspect  of  the  retrieval  process  is  intimately  tied  to  the 
meaning  of  the  elementary  files  which  constitute  the  data  set;  in  fact,  it 
supplies  ti  o  meaning. 

Each  data  accessing  method  represents  a  different  way  of  retrieving  records  from 
data  sets.  Some  of  the  more  common  techniques  are: 
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(1)  Sequential  access:  This  method  consists  of  sequentially  leafing  through  a 
data  set,  record  by  record,  until  the  requested  record  is  reached.  Sequential 
access  is  very  efficient  if  one  wishes  to  access  all  the  records  of  a  data 
set  in  the  order  in  which  they  are  stored.  It  allows  anticipatory  reading 
and  buffering,  so  that  the  requestor  may  not  have  to  wait  for  1/0  to  take 
place  before  he  can  process  the  next  record. 

(2)  indexed  access:  This  method  involves  first  referencing  an  index,  which 
can  give  either  the  approximate  location  of  the  desired  record,  to  which 
the  user  must  go  and  search  sequentially  until  he  finds  it;  or  the  location 
of  a  lower  level  index,  which,  in  turn,  specifies  either  the  above  mentioned, 
or  another  level  of  index. 

(3)  Direct  access:  This  method  allows  the  user  to  go  directly  to  the  record 
desired  in  that  the  record  is  requested  by  location  rather  than  by  name. 

Each  of  these  methods  has  many  variations,  each  of  which  can  result  in  drastic 
variations  in  operating  characteristics;  thus,  it  is  almost  impossible  to 
provide  a  brief  characterization  of  an  accessing  method.  It  can,  however,  be 
characterized  by  a  computer  program  which  simulates  the  operation  of  such  a 
method.  Hopefully,  the  interfaces  between  such  a  program  and  the  simulation 
system  environment  with  which  it  interacts  can  be  straightforward  and  simple, 
so  that  a  modeler  wishing  to  simulate  his  own  accessing  technique  (and  familiar 
with  the  language  in  which  the  model  is  implemented)  would  need  only  a  minimal 
amount  of  instruction. 

In  the  present  model,  programs  are  provided  to  simulate  well  known  accessing 
methods  such  as  the  IBM  OS/360  Sequential  Access  and  Indexed  Sequential  Access 
methods . 

6.  Accessing  Strategy 

Let  us  review  the  picture  of  the  model  that  has  been  presented  so  far.  We  have 
described  the  logical  description  of  the  data,  and  its  physical  realization  in 
the  form  of  data  sets.  The  data  accessing  methods  provide  us  with  a  way  of 
accessing  a  single  record  from  a  data  set.  The  qualification  specifications 
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supply  us  with  lists  of  records  which  need  to  be  access™!  to  fulfill  requests 
for  information  from  the  system.  The  final  step  is  to  provide  a  way  of  describ¬ 
ing  the  order  in  which  the  records  on  the  lists  are  to  be  accessed;  that  is,  a 
description  of  the  interrelation  of  data  sets  and  data  set  accesses  in  ful¬ 
filling  a  query.  This  description  is  analogous  to  the  lower  level  description 
cf  an  accessing  method,  which  describes  the  interrelationships  of  the  elementary 
files  of  a  data  set  in  fulfilling  a  request  for  a  single  record  of  the  data  set. 

In  general,  the  accessing  strategy  specification  allows  the  modeler  to  describe: 

(1)  Lists  of  records  to  be  accessed  from  the  data  sets  involved. 

(2)  The  accessing  method  to  be  used  in  accessing  a  given  set  of  records  from 
a  data  set . 

(3)  The  order  in  which  the  accesses  arc  to  occur. 

For  example,  a  modeler  may  wish  to  read  records  1,  3,  3,  ...  from  sequential  data 
set  A,  records  200,  400,  600,  ...  from  indexed  data  set  B,  and  merge  these 
records  onto  sequential  data  set  C.  This  example  uses  all  of  the  abeve  three 
elements:  the  lists  (1,  3,  S,  ...  and  200,  400.  600,  ...),  the  accessing 
methods  (sequential  and  indexed),  and  the  ordering  (read  from  A,  write  to  (.', 
read  from  B,  write  to  C, 

Three  basic  specifications  are  used  to  characterize  such  strategies  (in  the 
form  of  a  simple  procedural  language): 

(1)  The  LIST  specification,  which  defines  a  set  of  records  to  be  accessed.  Sucn 
a  list  can  be  a  literal  list  of  record  numbers,  a  sequential  or  skip 
sequential  list,  a  random  list  taken  from  a  given  distribution,  or  a 
random  or  sequential  list  of  records  which  qualify  on  the  basis  of  a 
qualification  specification.  This  last  mentioned  option  provides  the 

only  link  between  the  qualification  specification  and  the  accessing  strategy, 

(2)  The  ACCESS  OP  specification,  which  identifies  the  accessing  method  to  be 
used,  the  data  set  to  be  accessed,  and  the  record  to  be  accessed.  The 
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last  of  these  is  obtained  from  a  specified  list,  ana  removed  from  the  list, 
so  that  on  the  next  execution  of  the  statement,  the  "next"  record  on  the 
list  will  be  accessed  from  the  data  set. 

(3)  The  SYNC  specification,  which  allows  one  to  specify  a  random  interleaving 
of  operations  on  two  or  more  data  sets.  Such  a  specification  is  necessary 
to  describe  merge-type  operations. 

7 .  Hardware 

A  simple  hardware  configuration  is  assumed  for  the  purposes  of  this  model,  as 
depicted  in  Figure  6,  namely:  one  CPU  with  one  or  more  channels,  each  having 
one  or  more  control  units,  each  of  which  has  one  or  more  devices  attached. 

The  modeler  may  specify  that  a  control  unit  is  switchable  between  two  or  more 
channels.  In  such  a  case,  a  control  unit  may  be  logically  attached  to  at  mos. 
one  channel  at  any  given  moment,  and  will  remain  attached  to  that  channel  until 
the  "current"  request  is  satisfied. 

The  above  summarizes  the  topology  of  the  hardware  elements.  Kach  device  (e.g., 
a  disk  drive,  drum,  etc.)  in  turn  is  characterized  by  assigning  it  to  a  device 
class,  all  of  which  have  the  same  physical  characteristics;  for  example,  all 
2311  disk  drives  form  a  device  class. 

Direct  access  devices  are  characterized  by  such  parameters  as  rotational  period, 
number  of  tracks  per  cylinder,  cylinder  access  time  (which  may  be  a  function  of 
two  variable:  current  cylinder  and  sought  cylinder),  maximum  record  size,  gap 
factors,  and  so  on. 

3.  Input/(Xitput  Supervisor 

The  function  of  the  I/O  supervisor  is  to  accept  requests,  marshal  them  through 
various  queues,  and  see  them  through  the  completion.  This  component  of  the 
model  (like  the  accessing  methods)  is  characterized  by  a  program,  which,  usually, 
is  called  upon  by  an  accessing  method,  and,  in  turn,  interfaces  with  the  hard¬ 
ware  in  its  current  state. 
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The  I/O  supervisor  is  implemented  as  a  very  simple  event-driven  queueing  model, 
in  which  the  stations  are  devices,  channels,  and  the  CPU,  and  the  events  are 
begin  and  end  seek,  begin  and  end  transmit,  and  begin  and  end  CPU  processing. 

It  essentially  assumes  all  the  functions  of  an  operating  system  (other  than  data 
management);  hence,  the  name  is  something  of  a  misnomer.  However,  I/O  events 
are  assumed  to  be  the  predominant  concern  of  this  model. 

Briefly  (see  Figure  7),  an  I/O  request  to  the  1/0  supervisor  is  specified  in  the 
form  "request  I/O  from  device  XI,  cylinder  N2 ,  track  location  XI,  transmit  time 
X2,  operation  type  XS  (read,  write,  etc.)."  This  request  is  placed  on  a  seek 
queue  for  the  requested  device,  and  when  channel,  control  unit,  and  device  are 
free,  the  seek  is  initiated,  thus  tying  up  the  device.  At  end-of-seek,  the 
request  is  placed  in  a  transmit  queue  for  the  appropriate  channel,  and  when 
channel  and  control  units  are  free,  and  the  requested  track  position  comes  under 
the  head,  transmission  takes  place.  This  operation  ties  up  device,  control  unit, 
and  channel.  Finally,  the  requesting  program  is  signalled  that  its  request 
has  been  satisfied.  The  I/O  supervisor  maintains  current  hardware  status  (arm 
position,  channel  busy,  etc.)  and  advances  the  clock. 

An  accessing  method  may  also  issue  a  WAIT  for  a  particular  request,  and  it  may 
issue  a  PROCESS  for  a  given  time  T,  which  is  effectively  a  guarantee  that  the 
program  will  issue  no  more  requests  during  time  T. 

This  model  allows  the  modeler  to  simulate  the  effects  of  device  and  channel 
separation  on  data  sets  simultaneously  being  accessed.  If  such  detail  is  not 
required,  simpler  I/O  supervisor  programs  can  be  substituted,  or  in  fact,  it 
may  simply  be  ignored  by  accessing  modules  which  compute  their  own  timing 
characteristics . 

V.  CONCLUSION 

We  have  attempted  in  this  section  to  describe  a  model  of  a  certain  type  of  data 
management  system.  The  restrictions  placed  on  the  type  of  system  which  "fits" 
the  model  are  sufficiently  severe  to  make  the  resulting  model  relatively  simple 
(that  is,  relative  to  a  full-blown  model),  yet  general  enough  to  model  a  wide 
range  of  possible  systems. 


VI -22 


I  KiURI*:  ? 
I/O  Supcrvis 


Furthermore,  we  believe  that  the  design  of  model  and  programs  will  allow  future 
generalizations  to  areas  not  touched,  and  that  the  modeling  effort  will  develop 
ways  of  thinking  about  such  systems  which  will  lead  to  more  general  models. 

Vi  A  NO!':.  ON  THE  IMPLEMENTATION  OF  PHASE  II 

Presently,  the  model  exists  as  a  program  at  one  of  its  specified  levels  of 
"completion."  Elements  of  all  the  above  described  aspects  have  been  included 
at  this  point.  The  program  consists  of  about  8,000  lines  of  FORTRAN  code,  and 
occupies  a  load  module  of  2.1 5 K  bytes,  including  tables. 

The  speed  at  which  the  modeling  program  operates  is  roughly  proportional  to 
the  number  of  accesses  it  is  required  to  simulate,  with  a  rule  of  thumb  being 
SOO  microseconds  (on  the  mod  91)  per  hardware  access  s  imitated.  The  output  of 
the  model  currently  consists  of  timings  of  interest  to  the  modeler.  Future 
levels  of  the  model  envision  a  statistics  gathering  capability  which  will  [at 
the  user's  option)  gather  information  on  wait  times,  queue  lengths,  hardware 
activity,  and  so  on. 

VII.  DOCUMENTATION  AND  DELIVERY  OF  THE  PHASE  II  SYSTEM 

The  documentation  for  the  Phase  II  system  consists  of  this  section  and  section  t> 
(the  Phase  II  User  Guide).  The  system  itself  will  be  delivered  on  a  magnetic 
tape  containing  six  files  whose  contents  are  described  in  section  11  of  the 
User  Guide.  They  contain,  among  other  things,  source  and  object  mo.  ’les  of  the 
system,  and  the  User  Guide.  Accompanying  the  distribution  tape  is  a  computer 
output  listing  which  gives  example  OS/360  Job  Control  Language  for  installing 
and  maintaining  the  system.  The  JCL  as  distributed  will  probably  need  to  be 
modified  somewhat  to  conform  to  installation  conventions. 
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INTRODUCTION  TO  PHASE  II 


THE  PHASE  II  DATA  MANAGEMENT  SIMULATION  SYSTEM  IS  AN  ATTEMPT  TO 
PROVIOE  A  SIMULATION  MODEL  Or  COMPUTER  SYSTEMS  WHICH  ARE  DATABASE 
ORIENTED,  I/O  BOUND,  AND  WHOSE  SIGNIFICANT  EVENTS  OCCUR  ON  A 
MILLISECOND  TIME  SCALE.  IT  IS  ORIENTED  TOWARD  DATABASES  WHICH 
REPRESENT  HIERARCHICALLY  ORGANIZED  DATA  STORED  IN  A  MORF-nR-lFSS 
CONVENTIONAL  FASHICN  ON  DIRECT  ACCESS  DEVICES. 


WE  ASSUME  A  SINGLE-CPU  CONFIGURATION  WITH  A  S I NGL  fc- T A  SK I NG 
OPERATING  SYSTEM  IN  A  BATCH  ENVIRONMENT. 


BRIEFLY,  PHASE  II  ALLOWS  A  USER 
UR AT  ION,  A  DATABASE  DESCRIPTION, 
THE  HARDWARE  DEVICES,  A  SET  CF  D 
A  PROCEDURE  FOR  CARRYING  OUT  THE 
ITANT  FACILITIES,  SUCH  AS  TABLES 
ALSO  PROVIDED. 


TO  SPECIFY  A  HAR DWARF  CONFIG- 
A  mapping  OF  THE  DATABASE  ONTO 
ATA  QUALIFICATION  CRITERIA,  AND 
DATABASE  TRANSACTIONS.  CONCOM- 
,  DISTRIBUTIONS,  AND  LISTS,  ARE 


THE  PRINCIPAL  OUTPUTS  OF  THE  MODEL  ARE  TIMINGS  OF  THF  PROCESSFS 
OF  INTEREST  TO  THE  MODELER.  FuTUKF  VFRSIONS  OF  THF  , MODEL  WILL 
AISC  PROVIDE  SUMMARIES  UF  VARIOUS  CTHER  STATISTICS  TO  BE  GATHERED 
BY  THE  MODEL,  SUCH  AS  CHANNEL  UTILIZATION,  AVERAGE  wAIT  TIMFS, 

AND  SC  ON. 


IT  IS  SUGGFSTED  THAT  ONE  NOT  FAMILIAR  WITH  THE  MODEL  FIRST  TURN 
HIS  ATTENTION  TO  SECTION  A,  WHICH  BY  THE  USE  OF  SIMPLE  BUT 
INCREASINGLY  COMPLEX  EXAMPLES  CONVEYS  THE  FLAVOR  OF  T Hc  MODFL 
THESE  EXAMPLES  SHOULD  BE  STUDIED  IN  CONJUNCTION  WITH  THE 
APPROPRIATE  TABLE  DESCRIPTIONS  IN  SECTION  3,  WHICH  ALSO  SUPPLY 
DETAILED  SPECIFICATIONS  FOR  REFERENCF  PURPOSES. 


A  MORE  COMPLETE  TREATMENT  OF  THE  MODEL  ON  WHICH  PHASE  II  IS  BASED 
IS  GIVEN  IN  "PHASE  II  -  A  DATA  MANAGEMENT  SYSTEM  MODEL",  BY  THt 
AUTHOR,  AND  IS  A  COMPANION  DOCUMENT  TC  THIS  ONE. 
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2.C 


CONTROL  CAROS 


A  model  SPECIFICATION  CUNSISTS  OF  "CONTROL  CAROS"  ANO  "INPUT  TABLE 
CARDS".  THIS  SECTION  DEFINES  THE  TYPES  AND  MEANINGS  OF  tHE  CONTROL 
CARDS. 


EACH  CONTROL  CARC  INDICATES  TO  THE  SYSTEM  A  TYPE  OF  PROCESSING 
TC  3 E  PERFORMED;  FCR  EXAMPLE,  READ  HARDWARE  TABLES,  EXECUTE  PRO¬ 
CEDURE,  AND  SU  ON.  IN  ANY  RUN,  ONLY  ONE  OCCURRENCE  OF  CONTROL 
CARDS  L-Ll  MAY  APPfcAR,  WITH  THE  EXCEPTION  OF  "PROCEDURE"  AND 
"EXECUTE",  WHICH  MAY  8F  RE-SPFCIFICD  TO  PERMIT  EXECUTION  OF 
SEVERAL  PROCEDURES  IN  ONF  RUN. 

INPUT  TABLES  MAY  BE  SPECIFIED  IN  ANY  CRDER. 

CONTROL  CARDS  HAVE  THE  FOLLOWING  FORMAT: 

CCL  1  lb  20  25 

♦KEYWORD  P  N 

WHERE  : 


KEYWORD  SPECtFIFS  THE  TYPE  OF  PROCESSING. 

WHEN  KEYWORD  SPECIFIES  THAT  A  TABLE  IS  TO  BE 
READ  IN,  P  AND  N  ARF  INTERPRETED  AS  FOLLOWS: 

P  =BLANK  -  PRINT  TABLE  AS  READ  IN  ON  STANDARD 

OUTPUT 

-NON-BLANK  -  DO  NOT  PRINT 

N  FORTRAN  LOGICAL  FILE  FROM  WHICH  THE  TABLE  IS 

TO  BE  READ.  IF  BLANK  OR  ZfRC,  STANDARD  INPUT 
IS  ASSUMED,  IN  WHICH  CASF  THE  APPROPRIATE 
INPUT  TABLF  CARDS  IMMEDIATELY  FOLLOW  THF 
CONTROL  CARD  IN  THE  INPUT  STREAM. 


THF  FOLLOWING  CONTROL  CARDS  ARF  DEFINED: 

1.  *H ARDW AR  E 

READ  HAPCWARE  CONFIGURATIONS  AND  PHYSICAL  CHARACTERISTICS 

2.  ♦DEVICE  CLASS 

READ  PARAMETERS  DEFINING  DEVICE  CLASSES.  FOUR  DEVICE  CLASSES 
ARE  BUILT  INTO  THE  SYSTEM,  AND  MAY  BE  REFERRED  TO  BY  NAME: 
2314,2321,2311,2302. 

3.  ♦DATASETS 

READ  DATASET  CONFIGURATIONS  AND  PARAMETERS 
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♦segments 


READ  SEGMENT  CONE I GUR AT  I UNS  AND  PARAMETERS 
9.  *QUAL I E ICAT ION 

R  £  AO  UUAL1 F ICAT ION  SPECIFICATIONS 

6.  *  PROCEDURE 
READ  PROCEDURE 

7.  ♦LISTS 

KEAC  LIST  SPECIFICATIONS 

8.  ♦DISTRIBUTIONS 

READ  DISTRIBUTION  SPEC  I E I CA T I  DNS 

9.  ♦TABLES 

READ  TAbLE  S P EC  I F I C AT  1  ON S 
1C.  ^EXECUTE 

EXECUTE  PROCEDURE •  IF  NQ  PROCEDURE  IS  OFF  INFO,  THE  PROGRAM  WILL 
BRANCH  TO  SUBROUTINE  "AlTPR",  TO  BE  SUPPLJFO  BY  THE  USER. 

11.  *END 

ENC  OF  PROCESSING 

THE  FOLLOWING  THREE  CONTROL  CAROS  ARE  FUR  USE  AS  DEBUGGING  AIDS, 

BUT  ARE  INCLUOEC  FOR  THE  SAKE  OF  COMPLETENESS: 

12.  SPRINT 

PRINT  EACH  TABLE  AFTER  IT  IS  INPUT  AND  AFTER  IT  IS  INTERPRETFI7. 
SUCCEEDING  OCCURRENCES  OF  THIS  CARD  WILL  ALTERNATELY  TERMINATE 
AND  RE- INITIATE  SUCH  PRINTING. 

13.  *DUMP 

DUMP  EACH  TABLE  AFTER  IT  IS  INPUT  AND  AFTFR  IT  IS  INTERPRETED. 
SUCCEEDING  OCCURRENCES  OF  THIS  CARU  WlU  ALTERNATELY  TERMINATE 
AND  RE-INITIATE  SUCH  DUMPING. 

14.  *TRACE 

TRACE  ROUTINES  AS  SPECIFIED  BY  A  USER  SUPPLIED  "BLOCK  DATA" 
PROGRAM. 
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5.0 


INPUT  TABLES 


INPUT  TABLE  ENTRIES  HAVE  THE  FOLLOWING  FORMAT: 

CCL  1-4  5  6  -  IB  lb  -  71  72 

LABEL  KEYWORD  PARAMETERS  CONTINUATION 

WITH  THE  FOLLOWING  CONVENTIONS: 

1.  PARAMETERS  MAY  0E  SEPARATE'!  BY  COMMAS  AND/OR  ONE  OR  MORE 
BLANKS 

2.  T*0  CONSECUTIVE  COMMAS  INDICATE  THE  ABSENCF  OE  A  PARAMETER 

3.  A  NUN-BLANK  IN  CCL.  12  MEANS  THAT  THE  PARAMETER  LIST  CON¬ 
TINUES  CN  THL  NFXT  CARD 

4.  IF  A  CARD  ENDS  WITH  A  COMMA,  CONTINUATION  CN  THF  NEXT  CARD 
IS  ASSUMED 

B.  COMMAS  *LST  NCT  3E  CODED  FDR  ABSENT  TRAILING  PARAMETERS 

6.  THERT  IS  A  LIMIT  OF  115  CHARACTERS  FOR  A  PARAMETER  LIST. 

(A  PARAMETER  OF  LENGTH  N  CHARACTERS  COUNTS  AS  N+l 

CHARACTERS,  AND  IF  ThF  PARAMETER  LIST  STARTS  AFTER 
COLUMN  16,  THE  LEADING  BLANKS  ARE  ALSO  COUNTED) 

7.  THE  LABEL  MUST  START  IN  CCL.  1 

a.  THE  KEYWORD  OUST  START  IN  OR  AFTER  COL.  6,  AND  END  IN  OR 
BEFORE  COL.  15 

9.  VALUFS  TO  BE  INPUT  MAY  BE  REAL,  INTEGER,  OR  ALPHAMERIC, 

AS  IMPLIED  BY  THE  MEANING  OF  THE  PARAMETER.  THE  FORMS 
WHICH  THESE  VALUES  MAY  ASSUME  ARE: 

INTFGFP  A  STRING  OF  CONTIGUOUS  DIGITS,  WHICH  MAY  3F 
PREFIXED  BY  A  MINUS  SIGN 

REAt  LIKE  INTEGER,  EXCEPT  A  DECIMAL  POINT  MAY  APPFAR 

ALPHA  A  STRING  OF  NOT  MORE  THAN  FOUR  CONTIGUOUS 

CHARACTERS.  COMMAS,  PARENTHESES,  OR  BLANKS  MAY 
NCT  APPEAR  IN  AN  ALPHA  PARAMETER. 


THE  FOLLOWING  IS  A  DESCRIPTION  OF  INPUT  CONVENTIONS  AND  DEFIN¬ 
ITIONS  OF  INPUT  PARAMETERS  FOR  ALL  TABLES.  EACH  DESCRIPTION 
CONTAINS  A  DISCUSSION  OF  THE  INPUT  TABLE,  FOLLOWED  BY  A 
"PROTOTYPE"  EXAMPLE  OF  THE  T..HLE  ?  FOLLOWED  BY  DEFINITIONS 
OF  THE  PARAMETERS  USED,  AND  THEIR  DEFAULT  VALUES.  IF  NO  DFFAULT 
VALUE  IS  SPECIFIED,  THE  DFFAULT  IS  BLANK  FOR  A  NAME  FIELD;  ZERO 
FCR  A  NUMERIC  FIELC. 
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INDENTATION  IS  USED  IN  THE  KEYWORO  FIELD  AS  A  MATTER  OF  STYLE 
ONLY,  TO  CONVEY  "BELONGS  TO"  OR  "SUBORDINATE  TO"  RELATIONSHIPS 
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3.1  HARDWARE 


THE  HARDWARE  TABLES  UESCRIBF  THE  NUMBER,  TYPES,  .5 NO  CONFIG¬ 
URATIONS  OF  THE  HARDWARE  ELEMENTS  (CHANNELS,  CONTROL  UNITS, 

AND  DEVICES)  TO  BE  INCLUDED  IN  THE  SYSTEM. 

EACH  CHANNEL  IS  LISTED  AND,  UNDER  THAT  CHANNEL,  EACH  OF  THE 
CONTROL  UNITS  ATTACHED  TC  THE  CHANNEL.  SIMILAR!  Y,  UNDER  EACH 
CONTROL  UNIT  APE  LISTED  THF  DEVICES  ATT ACHEO  TO  THE  CONTROL 
UNIT.  A  DEVICE  IS  CONSIDERED  TO  BE  A  SINGLE  DRIVE;  THAT  IS. 

A  2 3 1 A  FACILITY  WOULC  CONSIST  OF  EIGHT  DEVICES  ATTACHED  TO 
CNE  CONTROL  UNIT. 

CCNTKCL  UNITS  MAV  at  SWITCHABLE  BETWEEN  CHANNELS.  TO  INDICATE 
SUCH  AN  OPTION,  A  CONTROL  UNIT  MAY  BE  LISTED  UNDER  «9RF  THAN 
ONE  CHANNEL.  HOWEVER,  THE  ATTACHED  DEVICES  MUST  BE  LISTED  ONLY 
ONCE  . 

CHANNELS  AND  UNITS  NEED  NOT  BE  EXPLICITLY  SPECIFIED.  IF  THF Y 
ARE  NOT,  AN  IMPLIED  (NAMELESS)  UNIT  AND/OR  CHANNEL  WILL  BE 
SUPPLIED  BY  THF  SYSTEM. 


♦HARD wAKF 
NAMt  CHANNEL 
NAME  UNIT 

FAME  ntVICF  TYPE.TRKP 


A 

UNIT 


CHANNEL 


PARAMETER  DEFINITIONS 

PARK  DEFINITION  DEFAULT 


NAME  NAMF  OF  CHANNEL  ».  UMT,  OR  DEVICE 

TYPE  DEVICE  TYPE.  THERE  ARF  FOUR  BUILT-IN  TYPES: 
2302 ,231 1 ,2  31  A, 2  32 1 
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TRKP  INITIAL  POSITION  Of  T  Ht:  HEAD  RELATIVE  TO  ZERO. 

this  provides  different  rflative  rotational 

POSITIONING  ACROSS  AC C £ S S  MEG  HAN  I  SMS . 

0 . <=  TRKP< -  1 . 


I 

I 
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U  E  V  I C  £  TYPE 


A  DEVICE  Tvpt  IS  A  COILECTICN  OF  PAR  AMETFRS  REPRESENTING  THE 
PHYSICAL  CHARACTERISTICS  OF  A  KIND  OF  DEVICE;  FOR  EXAMPLE, 

2311,  2321,  2  3 1 A ,  AND  2302  (INCIDENTALLY  The  FOUR  ARE  BUILT 
INTO  THE  SYSTEM  AND  NEED  NOT  BE  SUPPLIED  BY  T;:E  USER).  FACH 
DEVICE  IN  THE  HARDWARE  DESCRIPTION  MUST  ADOPT  ITS  CHARACTER¬ 
ISTICS  FROM  ONE  OF  THE  DEVICE  types. 

IT  IS  ASSUMFD  THAT  THE  CYLINDERS  OF  A  DEVICE  ARE  NUMBERED 
1  -  N,  AND  CAN  BE  DIVIDED  INTO  ACCESS  ZONES,  FACH  HAVING 
THE  SAME  NUMBER  Of  ( CONTIGUOUS)  CYLINDERS.  FURTHERMORE,  IT 
IS  ASSUMED  THAT  THE  ACCESS  ZONES  CAN  8F  SIMILARLY  SUBDIVIDED 
INTO  SUB-ACCESS  ZONES.  THIS  ZONATION  FORMS  THE  BASIS  FOR 
DESCRIBING  DELAYS  DUE  TO  ACCESS  ARM  MOVEMENT  BETWEEN  CYLINDERS 
OF  THE  DEVICE.  THESE  “ACCESS  TIMES'*  ARE  A  FUNCTION  DF  THE  NUM¬ 
BER  ANO  TYPES  CF  ZONE  BOUNDARIES  PASSED  QVFR ,  AND,  FOR  EACH 
TYPE  OF  ZONE,  CAN  BE  EXPRESSED  AS  A  SINGLE  SCALAR  VALUE  OR  A 
TABLE  CF  VALUES.  IE  Th£  DEV  I C.  F  HAS  NO  ZONATIGN  (OTHER  THAN 
CYLINDERS),  ONLY  ONE  SCALAR  CR  ONE  TABLE  NEED  BF  SPECIFIED. 


♦DEVICE  TYPE 

N A M r  DEVICE  PER, TRKC.NCYC, CAT, TB, DRAT ,TPC , 

OCV.K CV.VUC, 

\CYA,CATA,TABA,NCYS,CATS,TABS 


END 


PARAMETER  DEFINITIONS 


PARM 

OFF  INI TION 

DEFAULT 

NAME 

NAME  OF  DEVICE  TYPE 

PER 

ROTATIONAL  PERIOD  CF  CEVlCF 

IN  MS. 

TRKU 

TRACK  CAPACITY  -  MAX  SIZE  UN  BYTES! 
CAN  BE  STCRFC  CM  ONE  TRACK 

RECORD  THAT 

NCYl 

NC.  CYLS.  PER  OEVICE 

CAT 

BASIC  CYLINDER  ACCESS  TIME 

IN  MS. 

TB 

USED 

TH 

BASIC  CYLINDER  ACCESS  TIME 

TABLE 

CAT 

USED 

DRAT 

DATA  RATE  IN  BVTES/MS 
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TPC  NC.  TRACKS  PER  CYLINDER 

DOV  HARDWARE  OVERHEAD  FOP  DATA  -  BYTES  PEP.  BLOCK 

MJV  HARDWARE  OVERHEAD  FDR  KEY  -  BYTES  PFR  BLOCK 

VOC  H ARCWARE  OVERHEAD  VARIABLE  -  BYTES  PFR  BLOCK 

NCYA  NO.  CYLINDERS  PER  ACCESS  ZONE 

CATA  CYL.  ACCESS  TIRE  BETWEEN  ACCESS  ZONES  (MS) 
TABA  TABLE  OF  CATA 

NCYS  NO.  CYLINDERS  PER  SU8-ACCFSS  ZDNE 

CATS  CYL.  ACCESS  TIME  BETWEEN  SUB-ACCESS  ZONES  BUI 
WITHIN  ACCESS  ZONES 

TABS  TABLE  OF  CATS 


TB.  TABA.  TABS  ARE  TABLES  IN  WHICH 

AkG=  NC.  OF,  CYLINDER,  ACCESS  ZONE,  OR  SUHACCESS  ZONfc  BOUND¬ 
ARIES,  RESPECTIVELY,  TO  BE  PASSED  LIVER  BY  THE  ACCESS 
MECHANISM. 

VAL-  TIME  IN  MS.  FOR  THE  ACCESS  MECHANISM  TO  PERFORM 
THIS  MANEUVER 


USE  QE  A  SCALAR  ( C A T , C AT  A , C A TS )  INSTEAD  OF  THE  TABLE  IMPLIES 
THAT  THE  TIME  TAKEN  IS  I  NOE  PFNDENT  OF  THE  NUMBER  OF  BOUNDARIES 
CROSSEC. 


THE  NUMBER  OF  BLOCKS  THAT  CAN  BE  ACCOMMODATED  BY  A  TRACK  IS 
COMPUTED  BY  THE  FOLLOWING  FORMULA: 

1  +  ITRKC  -  KOV  -  KL  -  BLQCKSIZE)/( BLOCKS  I ZE  +  OOV  +  KUV 

♦  VOC  *  ( BLOCKS  I ZE  ♦  KL  >  ) 

WHERE  KL *0  IMPLIES  KOV-O  (KL  =  KEY  LENGTH). 
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3.3  SEGMENTS 


THL  SCGMCNT  SPECIFICATION  IS  THE  MEANS  BY  WHICH  thf  logical 
ORGANIZATION  .  if  i HE  DATABASE  IS  DESCRIBED,  AND  BY  WHICH  Thp 
ASSIGNMENT  Uf  SEGMENTS  to  DATASETS  IS  HADE.  IT  IS  ALSC  THE 
BASIS  FOR  PHRASING  QUALIFICATION  SPECIFICATIONS. 

A  SEGMENT  IS  A  COLLECTION  CF  FIELDS  DESCRIBING  AN  ENTITY.  FOR 
EXAMPLE,  NAME,  AGE,  AND  SEX  MAY  BE  USED  TO  DESCRIBE  A  PERSON. 
DIFFERENT  K  I  NCS  OF  ENTITIES  (FOR  EXAMPLE,  PECPLE,  ORGANIZ¬ 
ATIONS,  ANO  BOOKS)  CAN  BF  REPRESENTED  IN  THE  SAME  SYSTFM,  EACH 
HAVING  ITS  OWN  SEGMENT  TYPE  AND  COLLECTION  OF  FIELDS. 

A  HIERARCHICAL  OR  TREE  DATA  STRUCTURE  IS  ASSUMED;  THAT  IS,  EACH 
SEGMENT  TYPE  MAY  HAVE  ONE  OP  MOPE  "INFERIOR”  SEGMENT  TYPES, 

EACH  'JF  WHICH  OCCURS  A  GIVEN  NUMBER  OF  TIMES  FOR  EACH  OCCUR¬ 
RENCE  CF  ITS  SUPERIOR  SEGMFNT. 

EACH  SEGMENT  TYPF  IS  ASSOCIATED  WITH  A  GIVEN  OAT  ASF  T .  THIS 
ASSOCIATION  MEANS  THAT  ALL  OCCURRENCES  OF  THAT  SEGMENT  wlLL  BE 
ASSIGNED  TO  (STCRED  IN)  THE  SPECIFIED  DATASET. 

THE  "DATASET  MASTER  SEGMFNT"  CF  A  SEGMENT  IS  THE  HIGHEST 

level  segment  superior  to  it  and  on  the  same  oatasft.  all 

SEGMENT  TYPES  ASSIGNED  TO  A  G1VF.N  DATASFT  MUST  HAVE  THE  SAME 
DATASET  MASTER  TYPE. 

a  "DATASET"  RECORD  CONSISTS  OF  A  DATASET  MASTER  SEGMENT 
TOGETHER  WITH  ALL  SEGMENTS  INFERIOR  TO  IT  THAT  HAVE  ALSO  BEEN 
ASSIGNED  TO  THAI  DATASET. 

A  FIELD  MAY  BE  A  SORT  FIELD;  THAT  IS,  FIELD  VALUES  FOR  A  SORT 
HELD  WILL  OCCUR  IN  THF  DATASET  IN  THF  ORDER  IN  WHICH  THEY  ARF 
PRESENTED  IN  THE  DISTRIBUTION  OF  THE  FIELD.  THE  VALUES 
OF  NON-SORT  FIELDS  ARF  ASSUMED  TO  BE 

UNIFORMLY  DISTRIBUTED  THROUGHOUT  THE  OCCURRENCES  OF  THE  FIELD. 

A  SORT  FIELD  MAY  OCCUR  ONLY  IN  A  DATASET  MASTER  SEG¬ 
MENT,  AND  EACH  DATASET  MAY  HAVE  AT  MOST  ONF  SORT  FIELD. 


*  *«***•*«*  a******««*9  *********,,******,,(,,, ***********************  ******* 

♦SEGMENTS 

NAME  SEGMFNT  SIZE, SUP ,01, NPSS 

NAME  FIELD  SIZE, DIST, TYPE, SIDS 


SEGMENT 


END* 

*** ***** *************************** ************************************ 
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P4HAM£T£R  DEFINITIONS 


FARM  DEFINITION 


OEF  AUL  T 


NAMt  NAME  OF  SEGMENT  OR  FlEiO.  IT  IS  NOT  NECESSARY  TO  LIST 
THE  FIELDS  OF  A  SEGMENT  If  THEY  ARE  NOT  GERMANE  To  THE 
SPECIFICATION. 

size  field  size  or  initial  sfgmem  SIZE. 

AFTER  INPUT,  FOR  SEGMENT  THIS  BECOMES: 

INITIAL  SEGMENT  SIZE  +  SUM  OF  FIELD  SIZES  IN  SEGMENT 

SUP  SUPERIOR  SEGMENT  NAMF 

DS  DATASFT  TO  WHICH  THE  SEGMENT  IS  ASSIGNEO 

NPSS  NUMBER  OF  THESE  SEGMENTS  PER  SUPERIOR  SEGMENT. 

FUR  A  SEGMENT  WITH  NO  SUPERIOR  SEGMENT,  THIS 

WILL  BE  THE  TOTAL  NUMBER  OF  THESE  SEGMENTS  (AND  THE 

NUMBER  OF  RECORDS  CN  THE  DATASET). 

DIST  NAMF  OF  THE  DISTRIBUTION  (IN  THE  DISTRIBUTION  TABLE), 
OF  THE  VALUES  OF  THIS  FIELD. 

TYPE  FIELD  TYPE 

S  =  SORT  FIELC 

Si-  SECONDARY  INDEX  khY  EifcLl)  (NOT  IMPLEMENTED) 

SIDS  NAME  OF  SECONDARY  INDEX  DATASET  FOR  THIS  FIELD 
(NOT  IMPLEMENTED) 
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1.4  CA  T  A  St  (!) 


A  DATASEI  IS  A  LOGICAL  COLLECTION  Of  RECORDS  NUMBERFD  I-N,  AND 
IS  THE  PRIMARY  VEHICLE  FOR  REFERRING  TO  AND  ACCESSING  DATA. 

EACH  DATASET  CONSISTS  OF  ONE  OR  MQfif  FILES.  THE  SALIENT  FEATURE 
GE  A  FILE  IS  THAT  IT  HAS  THE  SAME  PHYSICAL  RFCORO  FORMAT 
THROUGHOUT;  WHEREAS,  FCR  DIFFERENT  FILES  BELONGING  TO  A 
CATASET,  THIS  may  NOT  BE  TRUE. 

f  OR  EXAMPLE,  AN  INDEXED  DATASET  MAY  CONSIST  OF  A  PRIME  Cl  A I A 
THE,  AN  INDEX  FILE,  AM’  AN  OVERFLOW  FILE,  ALL  HAVING  OIFFTPENT 
RECORD  LENGTHS  AND  FLOCKING  FACTORS. 

A  STRICTLY  SEQUENT  I AL  DATASFT  CONSISTS  OF  ONLY  ONE  FILE. 

SUCH  DATASETS  ARE  CALLED  "ONE-FILF"  DATASETS. 

A  FILE  IS  SUBDIVIDED  INTO  "EXTENTS'*.  THIS  ALLOWS  A  FILE  T0  BF 
SCATTERED  OVER  SEVERAL  DEVICES,  OR  TO  OCCUPY  NUN-CON T J GUOUS 
AREAS  CN  THE  SA“E  CEV1CE.  AN  EXTENT  IS  CH ARAC T ER I l E 0  BY  ITS 
DEVICE,  FIRST  CYLINDER,  AND  NUMBER  OF  CYLINDERS. 

MUST  OF  1  HE  PARAMETERS  DESCRIBED  HAVF  DEFAULT  VALUES.  IN  FACT, 
FILE  AND  FXTCNT  DESCRIPTIONS  MAY  BE  OMITTED  ENTIRELY  WHFPF  thc 
DEFAULTS  SUIT  THE  CSFR. 

IF  THE  EXTFNTS  SPECIFIED  RY  THE  MODELER  WILL  NOT  CONTAIN 
T HL  file,  ADDITIONAL  EXTENTS  WILL  HF  PROVIDED  from  unoccupied 
DEVICES  OF  THE  TYPE  ON  WHICH  THE  FILE  IS  TO  BESIDE.  SUCH 
DEE  AULT-ASS IGNEO  DEVICES  WILL  BECOME  UNAVAILABLE  FOR  SUfiSt'OUENT 
DEFAULT  ALLOCATION  (EVEN  THOUGH  THE  OCCUPYING  F I LF  DOES  NOT 
USE  THE  WHOLE  OEvlCFI. 

A  FILE  MAY  "SHARF"  EXTFNTS  WITH  ANOTHER  FILE;  THAT  IS,  IT  MAY 
OCCUPY  ThF  SAIF  CYLINDERS  AS  ANCTHEP  FILE  (BUT  DIFFFRFNT 
TRACKS) . 

TOR  A  DEFINITION  OF  THE  ACCESS  method  DEFINED  PARAMETERS 
FOR  FACH  ACCESS  HTTHCD,  SFF  SECTION  7. 


♦DATASETS 

NAME  DATASET  T YPE ,NREC ,P S I l , CEVT 

PARAM  ACCESS  METHOD  DEFINED  PARAMETERS 

NAME  FILF  T  YP  E  ,  DE  VT  ,R  PH  ,  TPC  ,  ALL  T  ,  ALL  ,  FIT  YP,  NBUf  , 

wV.CH.EXT ,RPC ,KL 
EXTENT  DEV.CYL.NCYl 


FILE 
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DATASf 


END 


parameter  pee  1 k I t ions 

PAKM  DEE  INI T ION  DEfAUL  T 


NAME  NAME  OF  OATASFT  OP  PILE 

TYPE  FCP  DATASET,  ACCESS  MET  PCD  TYPE.  FOR  A  DESCRIPTION  S 

OF  THE  ACCESS  METHODS,  SEE  SECTION  7. 

FOR  FILE,  IDENTIFICATION  OF  FILE  TYPE.  ALLOWABLE  FILE 
TYPES  DEPEND  ON  THE  TYPE  OF  DATASET  TO  WHICH  THC  f  I L  E 
BELONGS.  THIS  PARAMETER  IS  IGNORED  IF  VHF  DATASET  IS  A 
ONE-FILE  DATASET. 

NKFC  NO.  Of  RECORDS  ON  CATASET.  IF  SEGMENTS  ARE 

ASSIGNED  TO  THIS  DATASET,  "NREC"  IS  TAKEN  TO  BE 
THE  NUMBER  OF  DATASET  MASTER  SEGMFNTS.  IF 
NO  SEGMENTS  ARE  ASSIGNED  TO  THE  DATASET,  AND  ''NREC"  IS 
NOT  SPECIFIED,  IT  WILL  BE  COMPUTED  TO  BE  THF  NUMBER  OF 
RECORDS  THAT  WILL  PE  ACCOMMODATED  BY  THE  SPACE  ALLOCATED 
TO  THF  DATASET,  EITHER  THROUGH  THE  "ALLT"  AND  "ALL" 
PARAMETERS  OF  THE  ASSOCIATED  FILE,  OR  BY  THE  EXTENTS 
PROVIDED  BY  THE  USER  (ONE-ElLt  DATASETS  ONLY). 

RSIZ  RECORD  SIZE.  THIS  SIZE  IS  ADDED  TC  THE 

CONTRIBUTION  IN  RECORD  SIZE  DUE  TO  SIZES  OF 
SEGMENTS  (IF  ANY)  ASSIGNED  To  THE  DATASET. 

DEVT  DATASET  DEFAULT  DEVICE  TYPE.  A  SPECIFICATION  HERE  WILL 
CAUSE  ALL  FILES  ASSOCIATED  WITH  THE  DATA  SET  TO  BE 
ASSIGNEO  TO  DEVICES  OF  THIS  TYPE,  IF  THF Y  HAVE  NOT  BEEN 
OTHERWISE  ASSIGNEO. 

DEVICE  TYPE  TO  WHICH  THE  FILE  IS  TO  BE  ASSIGNED. 

THIS  PARAMETER  MAY  BE  OMITTED  IF  ASSIGNMENT  TO 
SPECIFIC  DEVICES  IS  MADE  BY  USE  OF  EXTENTS,  OR 
THIS  FILE  SHARES  EXTENDS  WITH  ANOTHER  PILE, 

OR  ASSIGNMENT  TO  THE  "DEVT"  SPECIFIED  BY  THE 
DATASET  IS  DESIREO.  A  FILE  MAY  RESIDE  ON  ONLY  DNE 
TYPE  OF  DEVICE. 

RPB  NC.  RECORDS  PER  BLOCK  I 

TPC  NC.  TRACKS  PER  ALLOCATED  CYLINDER  TO  BE  ASSIGNED 

TC  THIS  FILF.  DEVICE  TPC 
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ALLT 


R 


AlUCAT  I  ON  UNIT: 

rec  fob  allocation  in  records 

TRK  FCR  ALLOCATION  IN  TRACKS 
CVI.  Fr»  ALLOC  AT  ION  If;  CYLINDERS 

ALL  NO.  OF  THE  ABOVE  UNITS  TO  BE  ALLOCATED. 

DEFAULT;  ENOUGH  TO  ACCOMMODATE  “NfifC"  RECORDS, 

B T Y P  BUFFERING  TYRE  CD.''  SEQhfntiaL  ACCESS. 

"M"  =  »MOVF»l,  "L"  =  "LOCATE* ,  AS  DESCRIBED  BY 
OS/ 360  DATA  MANAGEMENT. 

NBUF  NO.  OF  HUFF  FR  $  TO  BE  ASSIGNED  WHEN  THIS  FILE  IS 
OPENED.  BUFFER  SI7F  IS  DICTATED  BY  BLOCKSIZF. 

MV  “WV"  IF  WRITE  VERIFICATION  IS  TO  BE  PERFORMED 
FOR  THIS  file  (NOT  CURRENTLY  IMPLEMENTED) 

ch  "ch"  ie  command  chaining  is  to  be  used  inot 

CURRENTLY  IMPLEMENTED) 

EXT  NAME  CF  FILE  WITH  *HICF  THIS  FILE  JS  TO  SHARE 
EXTENTS 

RPC  NO.  OF  RECORDS  OF  THIS  DATASET  TG  BE  ASSIGNED  PEP 
ALLOCATED  CYLINOER.  DEFAULT:  NO.  OF  RECORDS 
THAT  CAN  BE  ACCOMMODATED  BY  »TPC"  TRACKS. 

KL  KEY  LENGTH 

OEV  NAME  CF  THE  QEVICE  THE  EXTENT  RESIDES  ON 

CYL  CYLINDFR  OF  THE  CFVTCE  ON  WHICH  THE  EXTENT  BEGINS 

NCYL  NUMBER  CF  CONTIGUOUS  CYLINDERS  IN  THE  EXTENT. 

DEFAULT:  NO.  OF  CYLINCERS  ON  OCCUPIED  DEVICE. 
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3.5  QUALIFICATIONS 

THE  QUAUF  ICATICN  SPECIFICATIONS  ARE  ROUGHLY  EQUIVALENT  TO 
QUERIES  ON  THE  OAT  ABASE.  EACH  QUALIFICATION  DESCRIBES  A 
CRITERION  WHICH  A  SEGMENT  MUST  MFET  IN  OPOER  TO  QUALIFY,  AND 
RESULTS  IN  A  LIST  OF  QUALIFIi,,  RECORDS  ON  THE  ASSOCIATED 
CATASET.  THESE  LISTS  ARE  MADE  ACCESSIBLE  TO  THE  PROCEDURE 
THROUGH  THE  "SO”  AND  "RC"  T Y PE  LISTS.  OR  THF  »Q"  MODIFIER 
IN  A  PROCEOURE  ACCESS  OPERATION  (SEE  PROCEDURE  SPECIFICATION). 

THERE  ARE  THREE  TYPES  OF  QUALIFICATION  SPECIFICATION?  "FIELD". 
"BOOLEAN",  AND  "SEGMENT". 

IN  THE  FOLLOWING,  "FLO"  REPRESENTS  A  FIELD  NAME,  »Q1"  AND  "02" 
REPRESENT  QUALIFICATION  LABELS,  AND  “SEC"  REPRESENTS  A  SFGMFNT 
NAME. 

EACH  QUALIFICATION  IS  A  QUALIFICATION  ON  A  UNIQUE  SFGMFNT 
TYPE,  AS  FOLLOWS: 

A  FIELD  QUALIFICATION  IS  ON  THE  SEGMENT  CONTAINING  "FLO". 

A  BOOLEAN  QUALIFICATION  IS  ON  THE  SEGMENT  QUALIFIED  BY  "Ql" 

AND  "C2",  WHICH  MLS T  QUALIFY  THE  SaME  SEGMFNT. 

A  SEGMENT  QUALIFICATION  IS  ON  THE  SEGMENT  NAMED  BY  "SEG". 

IN  TURN,  A  QUALIFYING  SEGMENT  QUALIFIES  THE  RECORD  THAT  IT 
BELONGS  TO  IN  THE  DATASET  CONTAINING  THAT  RECORD.  WHEN  THE 
QUALIFICATION  TABLES  ARE  PRINTED  BY  THE  PROCEDURE,  THREE 
ADDITIONAL  PARAMETERS,  "LRC" ,  "HPQ" ,  AND  "NRQ"  ARE  ALSO 
PRINTED  FOR  EACH  QUALIFICATION.  THESE  REPRESENT,  RESPECTIVELY, 
THE  "LOW  RECORD",  "HIGH  RECORD",  AND  "NUMBER  OF  RECORDS" 
CUAUFIFD  ON  THE  APPROPRIATE  CATASET. 

A  QUALIFICATION  IS  INTERPRETED  AS  FOLLOWS  BY  TYPE  (SEE  THE 
PROTOTYPE  QUALIFICATIONS  BELOW): 

FIELD  -  A  SEGMCNT  CONTAINING  F I  FID  "FLO"  QUALIFIES 

IF  "FLO"  BEARS  RELATION  "RFL"  TO  VALUE  "VAL". 

BOOLEAN  -  A  SEGMENT  QUALIFIES  IF  IT  ALSO  QU AL I F I rS  BY 

"Ql"  "RELl"  "Q2". 

SEGMENT  -  SEGMENT  "SFG"  QUALIFIES  IF  IT  HAS  "REL2"  "N" 

SEGMENTS  THAT  QUALIFY  BY  "Q3". 

A  FIELD  QUALIFICATION  ON  A  SORT  FIELD  RESULTS  IN  QUALIFICATION 
OF  A  SUBSET  OF  THE  RANGE  OF  RECORDS  IN  A  DATASET.  FURTHER 
USE  OF  SUCH  A  QUALIFICATION  BY  A  SEGMENT  QUALIFICATION  I  WHERE 
THE  SEGMENT  8EING  QUALIFIED  IS  IN  THE  SAME  DATASET  AS  THE  GIVEN 
FIELD)  WILL  RESULT  IN  DERIVATIVE  QUALIFYING  SUBSETS  OF  THF 
SAME  TYPE.  SUCH  A  QUALIFICATION  IS  CALLED  A  "SORT  QUALIFIC¬ 
ATION",  AND  AN  "OR"  SEGMENT  QUALIFICATION  IS  NOT  PERMITTED  IF 
EITHER  "Ql"  OR  "QZ"  IS  OF  THIS  TYPE. 
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*e*****«***»***************************************«**** *************** 


♦  QUAL IFICAT  IONS 

NAYfc  QUAL  FLC,REL,VAL 

NAMP  QUAL  Q1,REL1,Q? 

NAME  QUAL  SFG,HAS,Q3,PEL?,N 


END 


PAR AMF  T FR  DEFINITIONS 

PARK  DEFINITION  DEFAULT 


NAVE  NA^t  GF  QUALIFICATION 

FLO  FIELD  NAVE 

Rf-L  RELATION:  "EC"  FOR  "FOUALS" 

"LE"  FOR  "LESS  THAN  OK  EQUAL  TO** 

"GT "  FOR  "GREATER  THAN" 

VAL  a  VALUE  CF  THAT  FIELD 

Jl.w?  NAVES  OF  QUALIFICATIONS.  01, G2  *UST  BE  QUALIFIC¬ 
ATIONS  ON  THE  SAVE  SFGMFNT 

"AND"  CR  "OK" 

NAVF  CF  A  SEGKFNT 

A  LITERAL  "HAS" 

NAME  OF  A  QUALIFICATION.  THE  SEGMENT  QUALIFIED  BY 
03  MUST  BE  L  I  Nf  ALLY  RELATED  .TO  "SFG",  THAT  IS, 

ONE  vi.sr  «E  A  DIRECT  DESCENDANT  CF  THE  CTHFR  IN 
THF  SECVFNT  HIERARCHY. 

fc C  , L E  , T  -  SAME  INTEPPRETATICN  AS  FOR  REL  GT 

ALL  -  ALL  SEGMENTS  RELATED  TO  "SEG"  MUST 

QUALIFY  BY  «Q 3" 

A  NUN-NEGATIVE  INTEGFR 


RFL1 
SEG 
hAS 
w  3 

RFL2 

N 


(FIELD  QUALIFICATION! 
(BOOLEAN  QUALIFICATION) 
( SEGMENT  QUALIFICATION) 
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3.6  PRCCELUftE 


THE  PROCEDURE  IS  THE  MEANS  BY  WHICH  THE  USEP  INSTRUCTS  THE 
SYS’EM  WHAT  IS  TO  9E  DONE  WITH  THE  CONFIGURATIONS  OF  HARDWARE , 
SOFTWARE,  DATASETS  AND  QUERIES  DESCRIBED.  IT  ALSO  PROVIDES 
CERTAIN  MODEL  CONTROL  AND  DEBUGGING  EACILITIES. 

INTERPRETATION  AND  DEFAULT  VALUES  OF  PARAMFTERS  ARE 
DEPENDENT  ON  THE  PROCEDURE  OPERATION  TYPE. 


«***««*«***#*9***«*#******>  >*  +  *****4************* **********««****0***** 

♦PROCEDURE 

LBL  OP  M  08J,LIST,SGC,FGC,TIMF 


END 


parameter  dee  imtions 


PARM  DEFINITION 


DEF AUL T 


LBL  STATEMENT  LABEL 

OP  OPERATION  TO  RE  PERFORMED  (COLS.  6-l'i) 

MUST  BE  SEPAKATFO  FROM  A  MODIFIER,  IF  PRESENT , 

BY  ONE  OR  MOPE  BLANKS 

M  (MOD)  MODIFIER,  WHICH  ALTERS  THE  MF AN  I NG  OF  THF 

OTHER  PARAMETERS  FOR  SOME  OPERATIONS  (COL  151. 

JUST  LEAVE  BLANK  TO  OMIT",  DO  NOT  CODE  A  COMMA. 

OBJ  OBJECT  OF  OPERATION 

LIST  LIST  TO  BE  USED  BY  THF  OPERATION.  IT  MAY  APPEAR 
AS  A  LITERAL  LIST,  IN  THE  FORM  I  XI , X2, . .  .  t XN  I  . 

SGQ  LABEL  TO  CO  TC  IF  OPERATION  "SUCCEEDS”  NEXT  STMT 

ECO  LABEL  TC  GO  TC  IF  OPERATION  "FAILS"  NEXT  STMT 

TIME  CPU  TIME  IN  MS.  TO  BF  ASSOCIATE!)  WITH  THIS 
OPERATION.  IT  IS  APPLIED  WHEN  THE  OPERATION 
IS  COMPLETE . 


OPERATIONS  WHICH  MAY  BE  USED  IN  "OP”  FIELD: 

PRINT  PRINT  ON  LCGICAL  FILE  "OBJ"  (DEFAULT*  STANDARD 
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DUMP 


TRACE 


NOTRACF 

STRACE 


OUTPUT)  THF  TABLES  NAMED  IN  "LIST".  ELEMENTS 
OF  THE  LIST  CAN  RE: 

ALL  PRINT  ALL  T ABL f S 

DI  DISTRIBUTIONS 

TK  TABLES 

DP  DEVICE  TYPES  (OR  PROTOTYPES) 

HO  HAROwARE  CONFIGURATION  AND  PARAMETERS 

SG  SEGMENT  (LOGICAL  STRUCTURE  OF  DATA) 

PR  PROCEDURE 

LI  LISTS 

ns  DATASETS  (PHYSICAL  STRUCTURE  OF  DATA) 

QU  QUALIFICATIONS 

T I  TIMERS 

IC  1  /C  STATUS 

FCR  SCME  TABLES.  SOME  INTERNAL  PARAMETERS  ARE 
PRINTED  CUT  TC  PROVIDE  ADDITIONAL  INFORMATION 
FOR  THE  MODELER.  INTERNAL  PARAMETERS  ARE 
DEFINED  IN  SECTION  10  UNDER  THE  APPROPRIATE 
TABLE  ENTRY. 


DUMP  (PRINT  EXTERNAL  AND  INTERNAL  PARAMETERS)  ON 
LOGICAL  FILE  "OBJ"  ( DEF AUl T=  STANDARO  OUTPUT)  TABLES 
NAMED  IN  'LIST",  WHICH  MAY  INCLUDE  ANY  OF  THE  ABOVE, 
PLUS  : 


□V  DEVICES 

CU  CONTROL  UNITS 

C“  CHANNELS 

LO  LOGICAL  DATA  (SEGMENTS  AND  FIELDS) 

FO  FIELDS 

Pn  PHYSICAL  DATA  (DATASETS,  FILES,  EXTENTS) 

F L  FILES 

FX  EXTENTS 

UT  UTILITIES  (DISTRIBUTIONS,  TABLES,  LISTS) 

BU  BUFFERS 

<3  I/O  QUEUES 

TRACE  SUBROUTINES  NAMED  IN  "LIST".  A  SUBROUTINE 
IS  IDENTIFIED  BY  A  FOUR  CHARACTER  STRING  CONSISTING 
OF  THE  FIRST  TWO  AND  THE  LAST  TWO  CHARACTERS  OF  THE 
SUBROUTINE  NAME.  THE  TRACE  WILL  APPEAR  ON  THE 
STANDARD  OUTPUT.  USE  CF  "EVNT"  AS  A  SUBROUTINE  NAME 
WILL  CAUSE  TRACING  OF  ALL  l/C  EVENTS.  USE  OF  "ALL" 
WILL  CAUSE  ALL  SUBROUTINES  (AND  I/O  EVENTS)  TO  BE 
TRACED.  SEE  SECTION  9.3  FOR  A  LIST  OF  ALL 
SUBROLT INES  IN  THE  SYSTEM,  AND  THEIR  FUNCTIONS. 
ASSEMBLER  LANGUAGE  ROUTINES  ARE  NOT  TRACEABLE. 


SUSPEND  TRACING  OF  THE  SUBROUTINES  NAMED  IN  "LIST" 

AFTER  F VERY  2A  PROCEDURE  STATEMENTS  EXECUTED,  PRINT 
THE  2A  STATEMENT  NUMBERS  ON  LOGICAL  FILE  "OBJ" 
(DEFAULT*  STANDARO  OUTPUT  I 
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ERRCR 


IF  AN  ERROR  OCCURS  DURING  EXECUTION,  TRANSFER  TO 
THE  LABEL  NAMED  BV  "OBJ".  IF  AN  ERROR  OCCURS  AND 
NC  "ERROR"  STATEMENT  HAS  BEEN  EXECUTED,  THF  LAST 
"ERROR"  STATEMENT  IN  THE  PROCEOURF  WILL  PRESCRIBE 
THE  ACTION  TO  BE  TAKEN. 

IF  "OBJ"  IS  BLANK,  THE  STATEMENT  FOLLOWING  THF  ERROR 
STATEMENT  IS  ASSUMED. 

RESTORE  RESTORE  THE  SYSTEM  TO  TIME=0 

TIME  SET  TIMER  "OBJ"  TC  ZERO 

PTIME  PRINT  TIMER  "OBJ"  (MS,  CF  ELAPSED  SIMULATED 

TIME  SINCE  IT  WAS  LAST  SET). 

ALL  TIMERS  ARE  AUTOMATICALLY  SET  TC  ZERO  AT 
PROCEDURE  INITIATION,  AND  BY  "RESTORF". 

END  END  CF  PROCEDURE.  THIS  DELIMITS  PROCEDURE  STATEMENTS, 

AND  WHEN  TRANSFERRED  TC,  WILL  END  PROCEDURE 
EXECUTION. 

(BLANK)  NC  OPERATION,  BUT  HONOR  "SGU"  AND  "TIME" 
PARAMETERS 

I N I T  RE-INIT  IALIZE  LIST  "OBJ",  AND/OK  EACH  LIST  IN  "LIST" 

TO  ITS  STATE  AT  PROCEDURE  INITIATION. 

SYNC  "SYNCHRONIZE"  THE  OPERATIONS  NAMED  AT  THE  LABELS  IN 

THE  LIST.  THAT  IS,  A  TRANSFER  TO  CNE  OF  THE  SPECIFIED 
LABELS  CAN  RESULT  IN  A  TRANSFER  TC  ANY  ONE  OF  THE 
LABELS.  THE  PROBABILITY  THAT  A  GIVEN  LABEL  wtLL  BE 
TRANSFERRED  TO  IS  PROPORTIONAL  TO  THE  CURRENT  LENGTH 
OF  THE  LIST  NAMED  BY  THE  LIST  ARGUMENT  AT  THAT  PRO¬ 
CEDURE  LABEL. 

"OBJ"  SPECIFIES  THE  LABEL  TO  BE  BRANCHED  fQ  WHEN  ALL 
THE  LISTS  ARE  EMPTY. 

THIS  OPERATION  PROVIDES  PARALLEL  GPERATI CNS  ON 
DATASETS  WHEN  THFRE  IS  NO  FIXED  PATTEPN  OF  ACCESSES, 
FOR  EXAMPLE,  IN  MERGE  OPERATIONS. 

"SYNC"  IS  A  PROCEDURE  CONTROL  OPERATION.  ITS 
EXECUTION  HAS  NO  IMMEDIATE  AFFECT  FXCEPT  TO  INDICATE 
TO  THE  SYSTEM  THAT  THE  SPECIFIED  STATEMENTS  ARE  TO 
OPERATF  IN  "SYNC"  MODE. 


READS  READ  CR  WRITE  (USING  SEQUENTIAL  PROCESSING)  A  RECORD 
WRITS  OF  THE  DATASFT  NAMED  BY  "OBJ".  THE  RFCORD  TC  HF 

ACCESSED  IS  DEFINED  BY  THE  NEXT  NUMBER  ON  THE  SPEC¬ 
IFIED  LIST.  SUCCESSIVE  ACCFSSES  WILL  BE  MADE  TO  THF 
DATASET  ISTARTING  AT  RECORD  l  IF  THF  DATASET  IS  NOT 
OPEN)  UNTIL  THE  PFQUIRED  RECORD  IS  ACCFSSED,  OP  AN 
END-OF-DATA  INDICATION  IS  RETURNFO. 
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the  record  number  to  be  accessed  is  removed  from  the 

LIST. 


RFAOR 

wRITR 


K  t  ADD 

WKITO 


wAIT 

UPDATE 

OPEN 


TKc  OPERATION  SUCCEEOS  IF  THE  REQUIRED  RECORO  CAN  BE 
ACCESSED,  AND  TRANSFER  IS  MADE  TO  THE  "SCO"  LABEL. 

THE  OPERATION  FAILS  IF  THE  LIST  IS  EMPTY,  OR  END-OF- 
OATA  IS  REACHED  ON  THE  DATASET. 

DEFAULT  FCR  "LIST"  IS  A  SEQUENTIAL  LIST  1,2,...,NREC 
WHFKF  NREC  IS  THE  NUMBER  OE  RECORDS  IN  THE  DATASET 

IE  MOO=  »F»,  THE  DATASET  IS  TAKEN  TO  BE  THE  DATASET 
ON  wHICH  THE  FIELD  NAMED  8V  "OBJ"  RESIDES. 

IF  MOD=  "O”,  THE  DATASET  IS  TAKEN  T)  BE  THE  DATASET 
ON  WHICH  THE  SEGMENT  QUALIFIF.C  BY  THE  QUALIFICATION 
NAMED  IN  "OBJ”  RESIDES.  THE  LIST  WILL  BE  INFERRED  TO 
BE  A  RANDOM/SEQUENTIAL  ARRANGEMENT  OF  THE  RECORDS 
OF  THE  OATASET  QUALIFIED  BY  THE  NAMED  QUALIFICATION, 
AND  THE  LIST  PARAMETER  SHOULD  NOT  APPEAR  EXPLICITLY. 

THF  "F"  AND  "C"  MODIFIERS  "AY  BF  USED  WITH  ANY  OPER¬ 
ATION  WHO 5E  OBJECT  IS  A  DATASET. 

READ  CP  WRITE  (USING  RANDOM  PROCESSING)  A  RECORD 
OF  ThE  OATASET  NAMED  BY  "OBJ".  THE  RECORD  TO  BE 
ACCESSED  IS  OEFINEO  BY  THE  NFXT  NUMBER  ON  THE  SPEC¬ 
IFIED  LIST. 

THE  RECORD  NUM8EP  TC  BE  ACCESStU  IS  REMOVED  FROM  THE 
LIST,  AND  THE  OPERATION  FAILS  WHEN  THE  LIST  IS  FMPTy, 

DEFAULT  FCR  "LIST"  IS  A  RANDOM  LIST  OF  NR FC  INTEGERS 
CN  THE  INTERVAL  U»NREC>. 

MUD  HAS  THF  SAME  I NTERPRET AT  I CN  AS  FOR  RFADS,  EXCEPT 
THAT  FOR  MOD=HQ".  THE  INFERRED  LIST  WILL  BE  A 
STRICTLY  RANDOM  LIST  INSTEAD  OF  A  RANDOM  SEQUENTIAL 
LIST. 

READ  CR  WRITE  (USING  DIRECT  PROCESSING!  A  RECORD  OF 
THE  DATASET  NAMED  BY  "OBJ".  THE  PARAMETERS  ARE 
INTERPRETED  AS  THEY  ARE  FOR  Rf ADR  AND  WRITR. 

SUSPEND  PROCESSING  UNTIL  COMPLETION  CF  THE  FIRST 
DIRECT  I/O  REQUEST  DN  DATASFT  "OBJ"  FOR  WHICH  A 
"WAIT"  HAS  NOT  BEEN  ISSUED. 

UPDATE  THE  LAST  RECORD  READ  FROM  DATASET  "OBJ" 

OPEN  DATASET  "OBJ"  WITH  THE  PARAMETERS  GIVEN  IN  THE 
LIST.  THESE  PARAMETERS  ARE  (STATUS, NBUF , 8TYP ,CH » WV I , 
WHERE: 


close 


STATUS1  R  FOR  READ  SEQUENTIAL 

W  FOft  WRITE  SEQUENTIAL 

X  FOR  RANOCH  PROCESSING 

THE  REMAINING  PARAMETERS  ARE  AS  DEFINED  IN  THE 

dataset  input  parameter  description. 

CLOSE  DATASET  "OBJ" 


1.1  Hits 


THE  LIST  FACILITY  PROVIDES  A  “EANS  OF  DEFINING  LISTS  FOR  USE  RY 
T Hf  PROCEDURE  SPECIFICATION.  THESE  PRIMAP1LY  HILL  Bf.  LISTS  OF 
RECORD  NUMBERS  TO  BE  ACCESSED  FROM  DATASETS  IN  THE  SYSTEM.  THF 
USER  CAN  SUPPLY  LISTS  WHOSE  CONTENTS  ARE  EXPLICITLY  DESCRIBED, 
OR  HE  MAY  SPECIFY  THAT  A  LIST  IS  TO  BE  DERIVED  FROM  A  QUALIF¬ 
ICATION  SPECIFICATION. 


♦LISTS 

NAME  LIST  TYPE, MC, SIZE, LC,HS,DIST 

VALl»VAL2»...,VAL20 
VAL2UVAL22,  ...,VAL4  0 


...  ,  VALN 

LIST 


END 

*********************************************************************** 


PARAMETER  DEFINITIONS 

PARM  OEFINITICN  DEFAULT 


NAME  LIST  NAME 
TYPE  LIST  TYPE: 

LL  =  LITERAL  LIST.  LIST  VALUES  ( VAl l , . . . , VALN I  ARE 
TO  BE  LISTED  UN  SUCCEEDING  CARDS. 

SL  *  SEQUENT  I *L 

RL  =  RANDOM 

RS  =  RANDOM/SEQUENTIAL  CSOPTED  RANDOM! 

SC  *  RANOOM/SEQUENTIAL  BASED  ON  QUALIFICATION 

RC  *  RANDOM  BASFO  ON  QUALIFICATION 

MQ  MODE  OF  LIST,  CR  QUALIFICATION  NAMF  FOR  SG,  RQ  TYPE 
LISTS.  MODE  CAN  BE: 

l  INTEGER 

R  REAL 

A  ALPHAMERIC,  ELEMENTS  AT  MOST  FOUR  CHARACTERS  EACH 

FOR  SC,  RQ  TYPE,  MQ  CAN  BE  ANY  QUALIFICATION  NAME. 


SIZE  SIZF  CF  LIST 


10 

HS 

cist 

VAL  l 


FOR  SL  ,  FIRST  ELEMENT  OF  LIST 

FOR  RL*  RS,  LOWER  BCUNO  FOR  ELEMENTS  OF  LIST 

FOR  SL,  SKIP  FACTOR 

FOR  RL ,  RS,  UPPER  BCUND  FOR  ELEMENTS  OF  LIST 

FOR  RL,  DISTRIBUTION  FROM  nHTCH  ElEMCF  TS  ARF  TO 
8E  TAKEN.  DEFAULT  =  UNIFORM  0 1 S TR I  ROT  I  ON  ON  ILO.HS) 

"  LITERAL  LIST,  70  PER  CARO,  EXCEPT  POSSIBIY 
I  He  LAST 


VAIN 


3 .  b  TABLES 


THE  TABLE  FACILITY  PROVIDES  A  T A8LF-L0CK-UP  CAPABILITY.  TABLES 
CAN  HAVE  ARGUMENTS  ANC  VALUFS  RHICH  ARE  INTEGER ,  REAL,  OR 
ALPHAMERIC  (AT  MUST  FOUR  CHARACTERS),  AND  MAY  REPRESENT  EITHER 
STEP-FUNCTIONS,  OR  FUNCTIONS  REQUIRING  INTERPOLATION  WHEN  A 
VALUE  IS  NEEOEO  CORRESPONDING  TO  AN  ARGUMENT  NOT  FXRLIClTlY 
LISTFO. 

IN  THE  CASE  OF  A  STEP-FUNCTION  AN  ( ARG, VAL I  PAIR  MEANS  THAT  ANY 
ARGUMENT  GREATER  THAN  OR  EQUAL  TO  ARG  AND  LFSS  THAN  THE  NEXT 
ARC  IN  THt  TABLE  HAS  VALLE  "VAL". 

AN  INTERPOLATION  TABLE  MAY  NOT  HAVE  ALPHAMERIC  ARG ' S  OR  VAL'S. 


•TABLES 

NAME  TABLE  T YP C , ARGT , VAL T 

ARG, VAL  t  CNF  PAIR  PER  CARO) 


TABLE 


■ 

END 


PARAMETER  DEFINITIONS 


PARM  DEFINITION 


DEFAULT 


NAME  NAML  Cf  TABLE 

TYPE  S-  STEP  FUNCTION 

1=  INTERPCLATF  (LINEAR) 

ARGT  ARGUMENT  TYPE: 

1=  INTEGER 
R-  REAL 
A«  ALPHAMERIC 

VALT  VALUE  TYPE  (AS  FCR  ARGT) 

ARG  ARGUMENT 

VAL  VALUE  ASSOCIATED  WITH  THE  ARGUMENT 
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3.9  DISTRIBUTIONS. 


THF  DISTRIBUTION  FACILITY  ALLCHS  THE  USER  TD  SPECIFY  DISTRIB¬ 
UTIONS!  CHIEFLY  FOR  DESCRIBING  THE  DISTRIBUTION  OF  THE  VALUFS 
OF  A  FIELD,  AND  DISTRIBUTIONS  OF  RECORO  NUMBERS  TO  BE  ACCESSED 
FkCM  OATASETS. 


♦DISTRIBUTIONS 

NAME  CIST  TYPE, MODE 

ARG, VAL  (ONE  PAIR  PER  CARD) 


DIST 


END 

**********«**4***»********«*****************4  4444*4444444«4444444444«4* 


PARAMETER  DEFINITIONS 


PARM  DEFINITION 


DEFAULT 


NAME  NAME  OF  DISTRIBUTION 

TYPE  C-  CONTINUOUS  FOR  REAL  DISTRIBUTION 

=  INTERPOLATE  FOR  INTEGER  DISTRIBUTIONS 

D*  OISCRETE  FOR  REAL  DISTRIBUTIONS 
a  NQ-INTEPPOL ATE  FOP  INTEGER  DISTRIBUTIONS 

MODE  RANDOM  VARIABLE  TYPE: 

1=  INTEGER 
R=  REAL 
A*  ALPHAMERIC 

ARG  DISTRIBUTION  ARGUMENT 

VAL  DISTRIBUTION  VALUE 


RULES: 

L.  IF  TYPE»C*  ARG*  S  MUST  RE  STRICTLY  INCREASING 

2,  IF  MODE* A,  TYPE  MUST  BE  D* 

3.  IF  LAST  VAL*  1.,  DISTRIBUTION  IS  ASSUMED  TO  BE 
IN  CUMULATIVE  FORM.  AFTER  INPUT,  ALL  DISTRIB¬ 
UTIONS  MILL  BE  STOREO  IN  CUMULATIVE  FORM, 

A.  IF  TYPE»C,  FIRST  VAL  MUST  BE  0. 
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3.10  TABLE  SUMMARY 


FOLLOWING  IS  A  Summary  of  SPECIFIABLE  INPUT  TABLES  and  THEtR 
INPUT  PARAMETERS. 


44******«****«***M*»***4**********»************************************ 

♦HARDWARE 
NAME  CHANNEL 
NAME  UNIT 

NAME  DEVICE  TYPF.THKP 


UNIT 


CHANNEL 


END 


«*44****************************»************************************** 

♦DEVICE  TYPF 

NAME  DEVICE  PER, TRKC »NCYC ,CAT , TB, DRAT, TPC, 

OOV,KCV,VCC, 

\CYA,CATA, TAEA,NCYS, CATS, TABS 


* 

END 


♦SEGMENTS 

NAME  SEGMENT  S U E ,SUP , CS »NPSS 

NAME  FIELD  SUE, D1ST, TYPE, SIDS 


SEGMENT 


END* 


********************************* ********************************* ***** 


♦DATASETS 


NAME 

CATASET 

TYPE,NREC,HSIZ,DEVT 

PARAM 

ACCESS  METHOD  DEFINED  PARAMETERS 

NAME 

FILE 

TYPE  ,DEVT  ,RPB,TPC, ALLT ,ALL ,BTYP,NBUF, 
MV«CH,EXT ,RPC,KL 

EXTENT 

* 

OEV,CYL,NCYL 

* 

FILE* 

• 

• 

DATASET 

END 


* 


♦  QUAL IF ICATIONS 
NAME  QUAL 
NAME  QUAL 
*.AME  QUAL 


FLO.REL,,  VAL 

C1.REL1.Q2 

SEG,HAS.QT,PFL2,N 


(MFLC  QUALIFICATION! 

< BULL  F AN  QUAL I F I C  A  T I  ON  > 
(SEGMFNT  QUALIFICATICNI 


END 


<,  +  ******, *****4************, *«*,«,******,***•**, ******»»*«*«*<,***  +  «**«* 

♦PROCEDURE 

LBL  QP  M  Oej, LIST, SC, 0. EGG, TIME 


END 


t************************************ ************ ********************** 


♦LISTS 

NAME  LIST  TYPE, MQ, SIZE, LC,HS,OIST 

VALl.VAL £,*••, VAL 20 
VAL21.VAL22, ...,VAL40 


LIST 


■ 

...  , VALN 
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♦TABLES 

NAME  TABLE  T  VPE  ,  AKGT  .  VAL  T 

ARG.VAL  (ONE  PAIR  PER  CARO) 


TABLE 


END 


********♦♦♦♦♦♦♦***♦♦♦♦*♦♦♦♦****** a************************************* 

♦DISTRIBUTIONS 

NAME  GIST  TYPE, MODE 

ARG  » VAL  (ONE  PAIR  PER  CARD) 

DIST 

END 
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4.0 


EXAMPLE  MODEL  SPECIFICATIONS 


1 

/ 

THE  FOLLOW  I NG  PAGES  CONTAIN  SOME  SAMPLE  PROGRAMS,  OUTPUTS,  AND 
EXPLANATORY  MATERIAL.  EACH  EXAMPLE  CONSISTS  OF  A  DESCRIPTION 
OF  THE  SYSTEM  AND  PROCESS  TO  BE  MODELLED,  NOTES  ON  THE  SPEC¬ 
IFICATION  AND  OUTPUT  OF  THF  MODEL,  AND  THE  ACTUAL  MODEL 
SPECIFICATIONS  AND  RESULTS.  THE  SAMPLE  PROGRAMS  WERE  RUN  ON 
THE  IBM  RESEARCH  360/91  COMPUTER  AT  SAN  JOSE,  CALIFORNIA.  THE 
SIMULATEO  TIMINGS  ARE  NCT  INTENDED  TO  REFLECT  ACCURATELY  ACTUAL 
TIMINGS  OBTAINABLE  UNDER  REAL  conditions,  the  MODEL  IS  STILL 
IN  ITS  EXPERIMENTAL  STAGES,  AND  CURRENTLY  PPOVIDFS  ONLY  A 
FUNCTIONAL  CAPABILITY  FOR  USERS  TO  AUGMENT  AND  CALIBRATE  TO 
THEIR  OWN  SATISFACTION. 
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4.1  R FAD  A  SEQUENTIAL  DATASET  OCCUPYING  A  WHOLE  2314  OISKPACK,  AND 
FORMATTED  CNE  RECORD  PER  BLOCK,  ONE  BLOCK  PER  TRACK.  TIME  THE 
PROCESS. 


NOTES: 

1.  THE  HARDWARE  CONSISTS  OF  ONE  2314  DEVICE  If.  E.  ONE  DRIVE, 

NOT  THE  WHOLE  FACILITY)  ON  AN  IMPLIED  CONTROL  UNIT  CONNECTED 
Tu  AN  IMPLIED  CHANNEL. 

2.  There  IS  ONE  DATASET,  "OS",  WHICH  IS  SEQUENTIALLY  ORGANIZED, 
("S">,  AND  HAS  4000  RECORDS  OF  7294  CHARACTERS  EACH.  IT  IS 
TO  RESIDE  ON  2314  CEVICES  (IN  THIS  CASE  IT  WILL  JUST  FIT  ON 
ONE  2314  OISKPACK). 

THE  PROCEDURE  SPECIFIES: 

3.  PRINT  THE  CATASFT  PARAMETERS.  NOTE  THAT  A  DATASET  CONSISTS  OF 
"FILES",  WHICH  IN  TURN  CONSIST  OF  "EXTENTS".  IN  THIS  CASE, 

"OS"  CONSISTS  OF  ONE  FILE,  WHICH  CONSISTS  OF  ONE  EXTENT,  EN¬ 
COMPASSING  A  WHOLE  2314  OISKPACK. 

4.  READ  A  RECORD  FROM  "OS".  THE  SECOND  PARAMETER  SPECIFIES  A 
LIST  OF  RECORDS  TO  BE  READ  FROM  THE  DATASET.  AS  EACH  RECORD 
IS  READ,  IT  IS  REMOVED  FROM  THE  LIST.  THE  SECOND,  OR  "LIST" 
PARAMETER  HAS  BEEN  OMITTED  IN  THIS  CASE,  IMPLYING  THAT  A  LIST 
CONSISTING  OF  ALL  THE  RECORDS  OF  THE  FILE  IS  TQ  BE  ASSUMED  BY 
DEFAULT.  THE  THIRD  PARAMETER  "R",  SPECIFIES  THAT  IF  THE  RFCORD 
IS  SUCCESSFULLY  READ  (I.  E.  AS  LONG  AS  THE  LIST  IS  NOT  EMPTY 
AND  AN  END-OF-OATA  IS  NOT  REACHED  ON  "DS"),  THE  NEXT  STFP  OF 
THE  PROCEDURE  TO  BE  EXECUTED  IS  THE  ONE  LABELLED  "R".  WHEN 
"READS"  FAILS,  EXECUTION  PROCEEDS  WITH  THE  NEXT  STEP  OF  THE 
PROCEDURE. 

THIS  STATEMENT  WILL,  IN  EFFECT,  CAUSE  THF  WHOLE  DATASET  TO  BE 
REAO  SEQUENTIALLY, 

5.  PRINT  TIMING  STATISTICS.  "SIMULATED  TIME"  IS  THE  AMOUNT  OF 
TIME  THF  PROCESS  WOULD  TAKE  AS  COMPUTED  BY  THE  SIMULATION. 
"REAL  TIME"  IS  CPU  TIME  USED  BY  THE  SIMULATION  PROGRAMS, 
"REDUCTION"  IS  THE  RATIO  OF  THE  FORMER  TO  THE  LATTER. 

6.  AT  PROCEDURE  TERMINATION,  A  PROCEDURE  STATEMENT  TRACE  IS 
PRINTED  OUT.  IT  IS  A  LIST  OF  THE  STATEMENT  NUMBER  $  OF  THE 
LAST  24  PROCEDURE  STATEMENTS  EXECUTED. 
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THE  PHASE  II  DATA  MANAGEMENT  SIMULATION  SYSTEM 
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.2  DEFINE  A  PERSONNEL  FILE,  STORFD  IN  SEQUENTIAL  FOPM  ON  A  2314 
OISKPACK.  READ  THE  DATASET. 


NOTES: 

1.  THE  PERSONNEL  FILE  IS  ORGANIZED  AS  FOLLOWS: 

EACH  EMPLOYEE  HAS  A  MASTER  SEGMENT  CONTAINING  HIS  NAME,  AGE, 
AND  EMPLOYEE  NO.  (NO).  THESE  FIELDS  HAVE  20,  2,  AND  10 
CHARACTERS,  RESP.  THERE  ARE  1000  OF  THESE  MASTER  SEGMENTS 
(THAT  IS,  1000  EMPLOYEES),  ANO  EACH  SEGMENT  HAS  AN  ADDITIONAL 
10  BYTES  NOT  ASSIGNED  TO  FIELDS.  ASSOCIATED  WITH  EACH  MASTER 
SEGMENT  IS  A  LIST  OF  JOBS  (JOB  SEGMENT)  AND  A  LIST  OF  THE 
EMPLOYEE'S  CHILDREN  (CHLO  SEGMENT).  EACH  JOB  SEGMENT  HAS  A 
JOB  TITLF  (TITL)  FIELD  AND  A  SALARY  (SAL)  FIELD,  ANO  EACH 
CHILD  SEGMENT  HAS  A  NAME,  AGE,  AND  SEX  FIFLD.  THERF  IS  AN 
AVERAGE  OF  3  JOB  SEGMENTS  PER  MASTER  SEGMENT,  AND  2  CHILD 
SEGMENTS  PFR  MASTER  SEGMENT. 

2.  ALL  SEGMENTS  HAVE  BEEN  ASSIGNED  TC  DATASET  "DS«,  HENCE  ALL  THE 
INFORMATION  ABOUT  AN  EMPLOYEE  WILL  BE  STORED  IN  A  SINGLF 
RECORD  OF  THAT  DATASET,  WHICH  IS  A  SEQUENT  I AL  DATASET  RESIDING 
ON  A  2314  DEVICE. 

THE  PRCCEDURF  SPECIFIES: 

3.  PRINT  THE  SEGMENT  ANO  OATASET  TABLES.  NOTE  THAT  EACH  SEGMENT 
IS  LISTED  TOGETHER  WITH  ITS  FIELDS,  AND  THAT  TOTAL  SEGMENT 
SIZES  HAVE  BEEN  COMPUTED.  FURTHERMORE,  WHEN  THE  DATASET  PAR¬ 
AMETERS  ARE  PRINTED  OUT,  TOTAL  (AVERAGE)  RECORD  SIZE  HAS  BEFN 
COMPUTED  FROM  THE  SEGMENT  SIZES  ( 42*3*20+2*2iS=  1 54 )  .  NOTE  ALSO 
FROM  THE  EXTENT  TABLE  THAT  "DS"  REQUIRES  TWO  CYLINDERS  OF  2314 
STORAGE. 

4.  READ  THE  WHOLE  DATASET,  PRINT  TIMING  STATISTICS. 
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VII-S7 


1  ’  « 


•PROCfOUAE 

PRINT  tISG.OS) 
R  READS  DStt* 
PIIME 
END 


SPECIFY  Twr  INDEXED- SEQUENTIAL  DATASETS,’  ONE  wITH  CYLINDER- 
EMBEDDED  OVERFLOW,  THE  OTHER  WITH  OVERFLOW  ON  A  SEPARATE  DFVICF 
READ  10CC  RANDOM  RECCPDS  FROM  EACH  DA  EASE  I  AND  TIME. 


NCTFS: 

1.  The  HARCWARf  CCN5IST5  OF  SEVEN  2314  DISKS  ON  ONE  CHANNEL. 

2.  T  kU  OAIASEfS  "CS1",  AND  "OS?"  ARE  SPECIFIED.  THEY  FACH  HAVE 
112200  BO-ChARACTER  ftFCORf'S,  AND  ARE  TO  RESIDE  ON  2314 
OEVICES. 

ThF  ACCESS-METHOD-RELATED  PARAMETERS  SPECIFY: 

It)  THL  PRIME  DATA  AREA  IS  OCCUPIED  TO  1.1  TIMES  ITS  CAPACITY; 
THAT  IS,  THE  PRIME  AREA  IS  FULL,  AND  AN  FCUl VALENT  nF  10* 
CF  THE  PRIME  RECORDS  IS  STORED  IN  OVERFLOW.  THIS  MFANS 
THAT  OF  THE  TOTAL  NUMBER  OF  RECORDS  IN  THE  DATASET 
(112200,  IC2CCC  ARE  STORED  ON  PRIME  TRACKS,  AND  10200 
ARE  STOREO  ON  OVERFLOW  TRACKS. 

12)  MASTER  INDEXES  ARE  TC  RE  CREATED  WHEN  LOWER  LEVEL 
INDEXES  OCCUPY  MORE  THAN  TWO  TRACKS. 

(3)  DISTRIBUTION  "CLD"  IS  SPECIFIED  AS  THE  DISTRIBUTION  OF 
THE  UVERFLC*  CHAIN  LENGTHS.  IT  SPECIFIES  THAT  60*  OF  THF 
TRACKS  that  HAVE  CVERFLnw  RECORDS  HAVE  only  ONE,  30X  HAVE 
TWO,  ANO  1C*  HAVE  THREE. 

3.  "DSl"  HAS  NC  FILES  EXPLICITLY  SPECIFIED,  SO  ITS  PRIME  (PR), 
TRACK  INDEX  (II),  CvERFLCw  (OF),  CYLINOER  INOFX  (Cl)  AND 
MASTER  INDEXES  (Mil,  MI2,  MI3),  IF  NEEDED,  WILL  BF  SPECIFIED 
BY  DEFAULT,  ANO  WILL  RESIDE  ON  2314  DEVICES.  "PR",  "TI«,  AND 
"UFM  FILES  WILL  SHARE  CYLINDERS,  AND  RECORD  LENGTHS  FOR  THE 
INDFX  FILES  WILL  BF  10  BYTES •  KEY  LENGTH  HAS  NCT  BEEN  SPECI¬ 
FIED  ON  THF  "PARAM"  CARO,  SO  IT  IS  TAKEN  BY  DEFAULT  TO  BF  10 
BYTES. 

"DS2"  DIFFERS  FROM  "DSl"  IN  THAT  A  SPFCIFIC  ASSI GNMFNT  OF  THE 
CVtRFLC*  FILE  HAS  CFEN  MAC!  TO  DEVICE  "DEVS",  THUS  PROVIDING 
A  SFPARATE  OVERFLOW  AREA. 

5.  A  LIST  CALLFD  "LIST"  IS  PROVIDEC  FOR  USE  8Y  THE  PROCEDURE.  IT 
IS  SPECIFIED  TO  BE  A  RANDOM  LIST  OF  1000  INTEGER  VALUES,  TO 

BE  CHOSEN  FROM  THr  RANGE  1-112200  (THF  PANGF  OF  RECORD  NUMBERS 
CN  "DSl"  AND  "DS?" ) « 

THfc  PROCEDURE  SPECIFIES: 

6.  PRINT  THE  DATASET  PARAMETERS.  NOTE  THE  DEFAULT  SPECIFICATIONS 
FCR  PRJMf,  CYLINDER  INOEX,  TRACK  INDEX  AND  IFCR  "DSl")  OVER¬ 
FLOW  FILES  THAT  HAVE  BEEN  SUPPLIED  BY  THE  SYSTEM. 


7.  RESET  A  TIMFR  TC  2ERQ 


A.  USING  RANDOM  ACCESSING*  READ  FROM  "DSl"  THE  RECORDS  SPECIFIED 
BY  LIST  "LIST".  PRINT  TIMING  STATISTICS. 

9.  INITIALIZE  "LIST"  TO  ITS  ORIGINAL  STATE  AS  DEFINED  IN  THE 
LIST  SPECIFICATION. 

10.  RESTORE  THE  SYSTEM  TC  TIME  0. 

11.  RE  AO  FRGM  "DS2"  ANC  TIME.  NOTE  THAT  THF  TIME  TC  READ  FROM 
"0S2"  IS  NOT  SIGNIFICANTLY  DIFFERENT  FROM  THE  TIME  TAKEN  TO 
READ  FRCM  "DSl".  THE  EXTRA  TIME  NEEDED  FOR  "DS2"  TC  PERFORM 
OVERFLOW  SEEKS  UTS  OVERFLOW  RECORDS  ARE  SPREAD  RANDOMLY  OVER 
TWELVE  CYLINDERS!  IS  OFFSET  BY  THE  FACT  THAT  THE  PRIME  RFCOROS 
OF  "DS2"  ARE  SPREAC  OVER  ONLY  179  CYLINDERS,  AS  OPPOSED  TO  200 
FOR  "DSl". 
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♦LISTS 

LIST  LIST  Rl, I, 1000. 1,112200 

END 
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4.4  CONSTRUCT  H-0  SEQUENTIAL  CATASETS  THAT  SHARE  CYLINDERS  ON  A  2314 
DEVICE.  CONSTRUCT  TWO  MORE  OATASETS  WITH  THE  SAME  CHARACTERISTICS 
BUT  OCCUPYING  SEPARATE  EXTENTS  CF  A  2314  DEVICE.  MERGE  THE  FIRST 
TwO  DATASETS.  TIRE,  THEN  MERGE  THE  SECOND  TWO  DATASETS  AND  TIME. 


NOTES: 

1.  THREE  2314  DISK  DRIVES  ON  TwC  CHANNFLS  ARE  SPECIFIED. 

2.  "DSl"  IS  SPECIFIED  AS  A  SEQUENTIAL  DATASET  WITH  FULL-TRACK 
(7294  BYTES!  RECORDS  TO  BE  STORED  ON  2314  DEVICES.  ITS  ASSOC¬ 
IATED  FILC.  "ELI"  IS  ALLCCATED  ACROSS  200  CYLINDERS.  OCCUPYING 
IS  TRACKS  CN  EACH  CYLINDER.  THF  NUMBER  OF  RECORDS  CF  "DSl" 

IS  NOT  SPECIFIED,  BUT  IS  TC  BE  INFFRREO  From  THE  AMOUNT  CF 
SPACE  ALLOCATED. 

3.  "DS2"  IS  ANOTHER  SEQUENTIAL  DATASET.  ITS  ASSOCIATED  FILE 
SHARES  EXTENTS  WITH  "FLt",  OCCUPYING  FIVE  TRACKS  PER 
CYLINOER. 

4.  "DS3"  AND  "DS4"  ARE  CATASETS  SIMILAR  TO  "DSl"  AND  "DS2", 
DIFFERING  CNLY  IN  THAT  INSTEAO  OF  SHARING  CYLINDERS  ACROSS  A 
DEVICE,  THEY  OCCUPY  FULL  CYLINDERS  IN  SF  PARATT  EXTENTS  ON  THE 
SAME  DEVICE. 

5.  "DS5"  IS  AN  OUTPUT  DATASET  F C R  THF  MERGE  OPERATIONS. 

THF  PROCEDURE  SPECIFIES: 

6.  PRINT  DATASET  PARAMETERS.  NCTE  THAT  THE  NUMBER  OF  RECORDS  IN 
EACH  DATASET  HAS  BEEN  COMPUTED  FROM  ERE  ANO  EXTENT  PARAM¬ 
ETERS. 

7.  "SYNCHRONIZE"  OPERATIONS  AT  LABELS  "RD1 "  ANO  "RD2"  IN  THE 
SENSE  THAT  A  TRANSFER  TC  "RD1"  OR  "RD2"  WILL  RESULT  IN  AN 
"INDETERMINATE  TRANSFER"  TO  ONE  CF  THE  LABELS,  ON  THE  BASIS 
THAT  THE  LISTS  SPECIFIED  AT  THE  TWO  LABELS  SHOULD  BE  EXHAUSTED 
AT  APPROXIMATELY  THE  SAME  TIME.  THIS,  IN  EEFFCT,  SIMULATES 
MERGE-TYPE  OPERATIONS  BY  INTERLEAVING  READS  UN  "DSl"  AND 
"DS2"»  BUT  IN  A  RANDOM  FASHION.  THE  SPECIFICATION  "EN01" 
INDICATES  THAT  WHEN  BOTH  LISTS  ARE  EMPTY,  CCNTRCL  TRANSFERS 

TC  LABEL  "EN01". 

8.  THE  "READS",  "READS",  "WRITS*  SEQUENCE  SPECIFIES  A  SEQUENTIAL 
READ  FROM  "DSl"  OR  "DS2",  FOLLOWED  BY  A  SEQUENTIAL  WRITE  TO 
"CS5".  THIS  SEQUENCE  IS  REPEATED  UNTIL  "DSl"  AND  "052"  HAVE 
BEEN  READ  IN  THEIR  ENTIRETY. 

9.  PRINT  TIMING  STATISTICS,  RESTORE  THF  SYSTEM  TO  TIME  0.,  ANO 
REPEAT  WITH  OATASETS  "DS3"  ANO  "DS4". 

10.  NOTE  THAT  THE  SECGNO  OPERATION  TAKES  CONSIDERABLY  LONGER  THAN 
THE  FIRST,  AS  EXPECTED,  SINCE  A  SIGNIFICANT  AMOUNT  OF  ARM 
MOVEMENT  TAKES  PLACE  WHEN  MOVING  BETWEEN  "0S3"  ANC  "CS4". 
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Tnf  PHASE  II  DATA  MANAGEMENT  SIMULATION 


•EXECUTE 

table  imespbetaticn  ebrobs 


rxc  CF  PKTH.EOURE.  L6C07  STATEMENTS  EXECUTEC 


4*5  DEFINE  A  HIERARCHICALLY  STRUCTURED  DATABASE;  ASSIGN  IT  TO  THREE 
DATASETS,  TWO  SEQUENTIAL  ANO  ONE  INDEX-SEQUENTIAL  INO  OVERFLOW  I • 
OEFINE  A  SET  OF  QUALIFICATIONS,  AND  ACCESS  THE  OATASETS  BASED  DN 
THESE  QUALIFICATIONS. 


NCTES: 


l.  the  SEGMENT  HIERARCHY  AND  ASSIGNMENT  TC  DATASETS  IS  GRAPH¬ 
ICALLY  OF  SCRIBED  IN  FIGURE  A. 5. I 


**************4** 
*  * 

*  DATASET  DSI  * 

*  * 

*  SI  * 

*  *  .  .  * 
*  ♦  .  .  * 
*  .  * 
*  .*  .  * 
*  S2  *  S3  * 

♦  .  .  *  * 

4  ...  4  4 

4  ...  4  4 

4  ...  4  4 

4  SA  S5  S6  *  4 

4  *  4 

4  4  * 

4  CATASET  0S2  4  4 

4  *  * 

******************************** 

FIGURE  A. 5.1 


2 •  SCME  OF  THE  SEGMENTS  HAVE  FIELDS  ASSIGNED  TO  THEM  WHOSE 
VALUES,  IN  TURN,  ARE  CHARACTERIZED  BY  DISTRIBUTIONS;  FOR 
EXAMPLE,  FIELD  "4.1"  IS  CHARACTERIZED  BY  DISTRIBUTION  "A2", 
WHICH  SPECIFIES  THAT  THE  F IELC  CONTAINS  ONE  OF  TWO  VALUFSs 
"Bl",  WHICH  OCCURS  90*  GF  THE  TIME,  AND  "B2"  WHICH  OCCURS  10* 
OF  THE  TIME.  NOTE  THAT  UNLIKE  "A2",  THE  OTHER  DISTRIBUTIONS 
HAVE  BEEN  SPECIFIED  IN  CUMULATIVE  FORM,  A  USER  OPTION, 

FIELDS  "2, 1"  ANO  "7.1"  ARE  "SORT"  FIELDS.  NOTE  THAT  EACH  OF 
THESE.  FIELDS  IS  IN  THE  HIGHEST  LEVEL  SEGMENT  OF  ITS  DATASET. 
THE  R6C0R0S  OF  OATASETS  "DS2"  AND  "0S3"  WILL  BE  OROERED  ON 
THESE  FIELDS. 

3.  THE  QUALIFICATION  SPECIFICATION  ILLUSTRATES  THE  THREE  TYPES  OF 
QUALIFICATION  STATEMENT; 

"Ql"  (A  FIELO  QUALIFICATION!  STATES  THAT  AN  "S2"  SEGMENT  QUAL¬ 
IFIES  BY  "Ql"  IF  FIELD  "4.1"  HAS  VALUE  «BZ". 
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rtQ3"  U  eOCLEAN  QUALIFICATION)  STATES  THAT  AN  «S2"  SEGMENT 
QUALIFIES  BY  "C3”  IF  IT  QUALIFIES  BOTH  BV  "Ql"  AND  "C2". 

"Q'5"  IA  SEGMENT  QUALIFICATION)  STATES  THAT  AN  "SI*  SEGMENT 
QUALIFIES  IF  IT  HAS  EXACTLY  ONE  l"EO,l"l  "S2"  segment  SUBORD¬ 
INATE  TO  IT  THAT  QUALIFIES  BY  "Ql". 

4.  FOR  ILLUSTRATIVE  PURPCSFS  {THEY  ARE  NOT  USED) »  TWO  LISTS  BASED 
ON  QUALIFICATION  ARE  SPECIFIED.  "LIS"  IS  TO  BE  A  SEQUENTIAL 
LIST  BASEO  ON  QUALIFICATION  "Ql»{  THAT  1$,  THE  LIST  ELEMENTS 
ARE  RECORD  NUMBERS  OF  RECORDS  QUALIFIED  BY  "Ql",  AND  ARE 
ARRANGED  IN  SORT  ORDER.  "L1R"  DIFFERS  FROM  "LIS"  IN  THAT  THE 
RECORD  NUMBERS  ARE  ARRANGED  IN  RANDOM  ORDER. 

THE  PROCEDURE  SPECIFIES? 

5.  PRINT  THE  SEGMENT  TABLES 

6.  PRINT  THE  DATASET  TAftLFS.  NOTE  THAT  "DSl",  "DS2"t  ANC  "DS3" 
HAVE  lOOOOO,  3COOOC,  AND  iOOOO  RECORDS,  RESPECTIVELY,  WHICH 
ARE  THE  NUMBER  OF  SEGMENTS  "SI",  "S2",  AND  "ST",  RESPECTIVELY. 
(EACH  QF  THESE  SEGMENTS  IS  THE  HIGHEST  LEVEL  SEGMENT  ON  ITS 
DATA  SET).  NOTE  ALSO  THAT  DEFAULT  -PARAM"  PARAMETERS  I"PPF", 
•»NM|",»KL",  AND  "CLD"'J  HAVE  BEEN  SUPPLIED  FOR  INDEX-SEQUENTIAL 
DATASET  "DS2".  AS  A  RESULT,  "DS2"  CONSISTS  ONLY  OF  PRIME, 

TRACK  INDEX,  AND  CYLINDER  INDEX  FILES. 

7.  PRINT  QUALIFICATION  TABLES.  IN  ADDITION  TO  THE  INPUT  INFORM¬ 
ATION,  THESE  TABLES  ALSC  SUPPLY  SOME  RESULTING  QUALIFICATION 
INFORMATION;  "LRQ"  WHICH  IS  THE  "LOW  RECORD  QUALIFYING",  "HRQ" 
WHICH  IS  THE  "HIGH  RECCRO  QUALIFYING",  AND  "NRO"  WHICH  IS  THE 
NUMBER  OF  Rf CORDS  QUALIFYING  ON  THE  INTERVAL  I "LRQ", "HRQ" ) , 

"Ql"  IS  A  FIELC  QUALIFICATION  ON  FIELD  "4.1",  WHICH  IS  IN 
SEGMENT  "54"  WHICH  RESIDES  ON  DATASET  "OS2"  -  HENCE  "Qi" 
QUALIFIES  570CC  RECORDS  ON  "0S2"  RANDOMLY  OCCURRING  OVER  THE 
WHOLE  DATASET.  THIS  NUMBER  IS  ARRIVED  AT  AS  FOLLOWS? 


FIELD  "4.1"  EQUALS  "B2"  10*  CF  THE  TIME,  HENCE  10*  OF  THF 
"54"  SEGMENTS  QUALIFY.  A  RECCRO  OF  "0S2"  IS  PRESUMED  TO  QUAL¬ 
IFY  BY  "Ql"  IF  IT  HAS  AT  LEAST  ONE  "S4"  SEGMENT  THAT  QUALI¬ 
FIES.  SINCE  THERE  ARE  TWO  "S4"  SEGMFNTS  PER  RECORD  OF  «DS2«t 
THE  PROBABILITY  THAT  A  RECCRO  OF  *0S2"  DOES  NOT  QUALIFY  IS* 

U  -  .10**2 

ANC  THE  PROBABILITY  THAT  IT  DOES  ISi 
l  —  1 1  —  .10**2  »,19 

HENCE  THERE  ARE  I . 191* { 3000001  •  57000  RECORDS  OF  "0S2"  THAT 
QUALIFY  BV  "Ql*. 


"Q2"»  **03',  AND  "Q4"  ARE  ALSO  QUALIFICATIONS  ON  "DS2".  "05“  IS 
A  QUALIFICATION  ON  «D$1",  AND  "QA",  "QT",  ANO  "OR"  ARE  QUAL¬ 
IFICATIONS  ON  "DS3".  THE  LATTER  THREE  INVOLVE  A  SORT  FIELD, 
HENCE  QUALIFY  CNLV  OVER  A  SUBSET  of  THE  WHOLE  DATASET  RECORD 

range  * 

"Q4"  AND  "Q5"  ARE  NOT  USEO  fiY  THE  PROCEDURE,  BUT  ARE  INCLUDED 
TO  ILLUSTRATE  THE  "OR*  BOOLEAN  AND  SEGMENT  QUALIFICATIONS, 
RESPECTIVELY. 

8.  PRINT  LIST  TABLES.  NOTE  THE  ENTRIES  FOR  "LIS"  AND  "L1R".  EACH 
CONSISTS  OF  57000  INTEGERS  BETWEEN  l  AND  300000,  WHICH  IS  THF 
SET  OF  RfcCORO  NUMBERS  QUALIFIED  BY  "Cl".  "LIS"  IS  A  SEQUFNT I AL 
LIST  t*$Q*»  ANO  "HR"  IS  A  RANDOM  LIST  ("RQ") 

LIST  "//!"' IS  AN  "EMBEDDED"  LIST*.  THAT  IS,  IT  WAS  EMBEDDED  IN 
PROCEDURE  STATEMENT  1. 

LISTS  "//2"  AND  "//3"  REPRESENT  LISTS  IMPLIED  BY  PROCEDURE 
STATEMENTS  LABELLED  "X"  AND  "Y",  RESPECTIVELY.  LIST  "//2"  IS 
A  SEQUENTIAL  LIST,  SINCE  THE  OPERATION  AT  "X"  IS  A  "RF ADS" 
(READ  SEQUENTIAL)*  AND  IS  BASED  ON  QUALIFICATION  "Q8",  AS 
SPECIFIED  BY  THE  MODIFIER  AND  OBJECT  AT  "X".  SIMILARLY,  "//3" 
IS  A  RANDOM  LIST  BASED  ON  QUALIFICATION  "Q3". 

9.  SET  A  TIMER  TO  ZERO,  READ  (USING  SEQUENTIAL  ACCESS)  THE 
KECOROS  QUALIFIED  BY  "QB",  AND  PRINT  TIHING  STATISTICS.  REPEAT 
FOR  QUALIFICATION  "C3",  USING  RANOOM  ACCESS  (ISAM). 

10.  NOTE  THAT  5500  OROCEOURE  STATEMENTS  HAVE  BEEN  EXECUTED.  THIS 
NUMBER  CAN  BE  BROKEN  DOWN  AS  FOLLOWS: 

2992*2500  TO  READ  QUALIFIED  RECORDS 

2  "READS"  AND  "READR"  WHICH  FAILED  (DUE  TO  END-OF- 
L  1ST ) 

6  CTHER  PROCEDURE  STATEMENTS  (INCLUDING  "FNO" ) 

11.  NOTE  THAT  THE  "READS"  TOOK  AN  AMOUNT  OF  TIME  NEEDED  TO  RF  AD 
114  TRACKS  OF  2314  STORAGE  (29*5  =  25*114  APPROX)  WHICH 
ACCOUNTS  FOR  REACING  FROM  THE  BEGINNING  OF  DATASET  "DS3"  DOWN 
TO  RECORD  5000. 
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ERRCR  CONDITIONS 


EACH  ERROR  THAT  CAN  BY  DETECTED  BY  PHASE  II  HAS  A  UNIQUE  ERROR 
CODE  ASSIGNED  TO  IT,  AND  FALLS  INTO  ONE  OF  SEVEN  ERROR  CLASSES. 

EACH  ERROR  CLASS  IS  ASSOCIATED  WITH  A  TYPE  OF  ERROR,  AND  DETERMINES 
WHAT  WILL  BE  THE  DISPOSITION  CF  THE  ERROR  CONDITION.  THE  CLASSES 
ARE : 


CLASS  0 


WARNING  -  CONTINUE  PROCESSING 


CLASS  I  -  ,  ERROR  DURING  PROCEDURE  EXECUTION  -  GO  TO  ERROR  LABEL 
IN  PROCEDURE,  IF  POSSIBLE.  OTHERWISE,  DUMP  ALL 
TABLES  AND  FLUSH  DOWN  TO  NFXT  CONTROL  CARD. 


CLASS  2 


ERROR  DETECTEC  IN  INPUT  -  FLUSH  DOWN  TO  NEXT  CONTROL 
CARC,  DELETE  EXECUTION  OF  THE  PROCEDURE 


CLASS  3 


ERROR  DETECTED  DURING  INTERPRETATION  OF  TABLES  - 
CONTINUE,  BUT  CELETE  EXECUTION  OF  THE  PROCEDURE 


CLASS  A 


SYSTEM  ERRCR  -  DUMP  ALL  TABLES,  FLUSH  DOWN  TO  NFXT 
CCNTROL  CARD  -  OELFTE  EXECUTION  OF  THE  PROCEDURE 


CLASS  5  - 

CLASS  6  - 


CATASTROPHIC  ERROR  -  ABEND 

MILD  ERRCR  IN  INPUT  -  CONTINUE,  DFLETE  EXECUTION 
OF  THE  PROCEDURE 


THE  FOLLOWING  IS  A  LIST  AND  DESCRIPTION  OF  ALL  ERRORS  DETECTABLE 
BY  THE  SYSTEM! 


CCCE  DESCRIPTION  CLASS 

1  ATTEMPT  TO  REOPEN  AN  ALREADY  OPEN  FILE  l 

2  ATTEMPT  TO  OPEN  TOO  MANY  FILES  AT  ONCE  l 

3  ATTEMPT  TO  CLOSE  A  FILE  THAT  IS  NOT  OPEN  1 

4  ATTEMPT  TO  CLCSE  A  FILE  WHICH  HAS  I/O  REQUESTS 

PENDING  THAT  ARE  NOT  KNOWN  TO  THE  SYSTEM  l 

5  ATTEMPT  TO  OPEN  SEQUENTIAL  FILE  WITH  BUFFER  TYPE 

OTHER  THAN  'M* ,  *L*  OR  BLANK.  I 

6  NOT  ENOUGH  DEVICES  OF  A  REQUIRED  TYPE  TO  SATISFY  ALL 

SPACE  REQUESTS  3 

7  DEVICE  REFERRED  TO  BY  FILE  OCES  NOT  EXIST  3 

fi  FILES  SHARE  CYLINDERS,  BUT  THEIR  REQUIREMENTS  DO  NOT 


ROUTINE 

OPEN 

OPEN 

CLOSE 

CLOSE 

OPEN 

ALLOC 

ALLOC 
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MATCH 

9  REQUESTED  RECORD  NOT  IN  FILE 

10  NO  DEVICE  CLASS  ASSIGNED  TO  FILE 

11  FILE  REFERRED  TO  BY  fjle  WITH  WHICH  IT  SHARES 
EXTENTS  DOES  NOT  FXIST 

12  DEVICE  ASSIGNEC  TO  FILE  DOES  NOT  EXIST 

13  DEVICE  CLASS  ASSIGNED  TO  FILE  DOES  NOT  EXIST 

U  UNACCEPTABLE  LIST  INPUT: 

(I)  LIST  TABLE  FILL 

121  LITERAL  LIST  CONTENTS  TABLE  FULL 

(3)  LITERAL  LIST  CONTENTS  KEYWQRO  NOT  BLANK 

(4)  INCORRECT  KEYhORO 

15  ATTEMPT  TO  USE  AN  ILLEGALLY  SPECIFIED  LIST 

16  SAME 

17  A  REAL  OR  ALPHAMERIC  LIST  HAS  BEEN  SPECIFIED  -  AN 
INTEGER  LIST  IS  REQUIRED#  OR  VICE-VERSA 


18  LIST 

19  LIST 

(1) 

(2) 

(31 

2G  LIST 


TYPE  INCORRECTLY  SPECIFIED 

INCORRECTLY  SPECIFIED: 

ALPHAMERIC  LIST  HAS  BEEN  SPECIFIED  AS 
SEQUENTIAL 

RANDOM /SEGUE NT l AL  LIST  HAS  BEEN  SPECIFIED  AS 
REAL  OR  ALPHAMERIC 

RANDOM/ SEQUENTIAL  LIST  HAS  BEEN  SPECIFIED  WITH 
MORE  ENTRIES  THAN  ITS  INTERVAL  PROVIDES 

TABLE  FULL 


21  ATTEMPT  TO  DEFINE  TOC  MANY  CREATED  LISTS 


22  ATTEMPT  TO  "PUT"  INTO  LIST  WITH  ILLEGAL  MODE 


23  CUT  OF  LITERAL  LIST  ENTRY  SPACE 


26  DISTRIBUTION  SPECIFIED  FOR  RANDOM  LIST  DOES  NOT 
EXIST 

25  NON-UNIFORM  DISTRIBUTION  SPECIFIED  FOR  RANOCM/SF- 
CUENT I AL  LIST 

2fc  ILLEGAL  "PLT"  -  LIST  ALREADY  CREATEC 


27  ATTEMPT  TO  READ  BEYOND  FND-OF-DAT  A 


3  ALLOC 

1  LOCATE 
3  IFL 

3  IFL 
3  IFL 
3  in 

2  Rt  I 

1  G'T 
1  I  OFT 

1  I  OF  T 
C.F  T 

0  III 

0  I L  I 


6  CREATE 
6  CREATE 
4  PUT 
2  PUT 

0  II  I 

0  I  L  I 
4  PUT 
l  READS 
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28  ATTEMPT  TO  WRITE  TCC  MANY  RECORDS  TO  FILE 

29  ILLEGAL  I/O  REQUEST: 

(1)  READ  ON  FILE  CPENED  FOR  WRITE 
IP)  WRITE  ON  FILE  OPENED  FOR  READ 

(3)  UPDATE  CN  CLOSED  FILE,  OR  ONE  NOT  OPEN  FOR  READ 
IN  LOCATE  MODE 

30  INPUT  LINE  EXCEEDS  130  CHARACTERS 

31  MASTER  HAS  ILLEGAL  TYPE  FIELC 

32  ILLEGAL  CHARACTER  IN  INPUT  FIFLO.  OR  ALPHAMERIC 
FIELD  EXCEEDS  FOUR  CHARACTERS 

33  PROCEDURE  HAS  TCC  MANY  STEPS 

34  UNRECOGNIZED  OPERATION  IN  PROCEDURE 

35  TOC  MANY  UNRECOGNIZED  OPERATIONS  IN  PROCEDURE  - 

APPARENTLY  WHAT  IS  BEING  READ  IS  NOT  A  PROCEDURE 

36  INPUT/OUTPUT  QUEUE  TABLE  OVERFLOW  -  TCC  MANY 
PENDING  I/O  REQUESTS  WITHOUT  INTERVENING  WAITS. 

37  PENDING  I/O  RECUEST  IN  BUFFER  NOT  RFCCGNIZFD 
BY  SYSTEM 

38  ERROR  IN  DISTRIBUTION  INPUT: 

(1)  TOC  MANY  DISTRIBUTIONS  SPECIFIED 

(21  TOO  MANV  ENTRIES  IN  DISTRIBUTION  CONTENTS  TABLE 

(3)  INCORRECT  KEYWORD 

39  DISTRIBUTION  INCORRECTLY  SPECIFIED: 

III  MODE  NOT  A,  R,  OR  I 
12)  TYPE  NOT  0  OR  C 

(3)  FIRST  VALUE  OF  CONTINUOUS  DISTRIBUTION  NOT  0. 

(A)  DISTRIBUTION  DOES  NOT  ACCUMULATE  TO  1. 

AO  ALPHAMERIC  DISTRIBUTION  SPECIFIED  AS  "C" ;  "0" 
SUBSTITUTED 

Al  ATTEMPT  TO  USE  AN  INCORRECTLY  SPECIFIED  DISTRIBUTION. 
ROUTINES:  DISTV, 01 STC ,D ISTA , 1 01  ST Af 01  ST , I D I  ST 

A2  ILLEGAL  USE  OF  IOIST  FUNCTION: 

III  REAL  OR  ALPHAMERIC  DISTRIBUTION  -  INTEGER 
DISTRIBUTION  REQUIRED 

12)  LOW  OR  HIGH  VALUES  DO  NOT  MATCH  DEFINED  VALUES 
OF  DISCRETE  DISTRIBUTION 

(3)  LOW  OR  HIGH  VALUES  NOT  IN  RANGE  OF  INTERPOLATE 
DISTRIBUTION 

43  SYSTEM  ERROR  -  DISCRETE  DESTRUCTION 


WRITS 

READS 

WRITS 

UPOATS 

INTFRP 

INTERP 

INTFRP 

RPR 

RPR 

RPR 

AC 

RESET 

RDI 

RDI 


RDI 

IDIST 


10  I  ST 
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44 

SYSTEM 

ERROR  - 

CONTINUOUS  DISTRIBUTION 

4 

ID  I  ST 

45 

SYSTEM 

ERROR  - 

DISCRETE  DISTRIBUTION 

4 

niST 

4 1 

SYSTEM 

ERROR  - 

CONTINUOUS  DISTRIBUTION 

4 

01  ST 

47 

I LLFGAL 

USE  OF 

01 S T  FUNCTION: 

1 

DIST 

(1)  INTEGER  DISTRIBUTION  -  REAL  CR  ALPHAMERIC 
REQUIRED 

(  2 ) - ( 3 )  AS  IN  42 

4p  not  used 

49  GIVEN  ARGUMENT  UPES  NOT  MATCH  A  DEFINED  ARGUMENT  FOR 


AN  ALPHAMERIC  CISTRI8UT1CN  1  DISTV 

50  GIVEN  ARGUMENT  DOES  NUT  MATCH  A  DEFINED  ARGUMENT  FOR 

AN  INTEGER  DISTRIBUTION  I  DISTV 

51  ATTEMPT  TO  USF  DISTV  (FOR  DISCRETE  DISTRIBUTIONS) 

ON  A  CONTINUOUS  DISTRIBUTION  1  DISTV 

52  GIVEN  ARGUMENT  DCFS  NOT  MATCH  A  OFF  I  NEC  ARGUMENT  FOR 

A  REAL/CI SCK6TE  DISTRIBUTION  1  DISTV 

53  SAME  AS  49  1  01 STC 

54  SAME  AS  SC  1  PI STC 

■>5  GIVEN  ARGUMENT  DOES  NOT  MATCH  A  TABULATED  ARGUMENT  1  DISTC 


FUR  A  REAL/DISCRETF  DISTRIBUTION,  OR  DOES  NOT  FALL 
WITHIN  THE  RANGE  OF  A  CONTINUOUS  DISTRIBUTION 

56  DISTRIBUTION  REFERENCED  DOES  NCT  HAVE  INTEGER  VALUES  1  IDISTA 
OR  GIVFN  VALUE  OUT-CF-RANGE 

57  DISTRIBUTION  REFERENCED  OrFS  NOT  HAVE  REAL  OR  ALPHA-  1  0ISTA 
MERIC  VALUES,  OR  GIVEN  VALUE  IS  CUT-OF-RANGE 

5 fi  ERROR  IN  HARCwARE  INPUT:  2  RHD 

(1)  TOO  MANY  CHANNELS,  CONTROL  UNITS,  OR  DEVICES 
SPECIFIED 

(2)  TOO  MANY  CONTROL  UMTS  ASSIGNED  TC  ONE  CHANNEL 

(3)  TOO  MANY  CHANNELS  ASSIGNED  TO  ONE  CONTROL  UNIT 
<4)  INCORRECT  KEYWORD 

15)  TOO  MANY  DEVICES  ASSIGNED  TO  ONE  CONTROL  UNIT 

59  ERROR  IN  DEVICE  CLASS  INPUT:  2  RDP 

U)  TOO  MANY  CEVICE  CLASSES 
(2)  INCORRECT  KEYWORC 

6C  TA8LF  REFERREC  TO  EY  DEVICE  CLASS  ENTRY  DOES  NOT 

EXIST  3  IDP 

61  DEVICE  CLASS  REFERRED  TC  BY  DEVICE  ENTRY  OOES  NOT 
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EXIST 


^  IOV 


62  ERROR  IN  TABLE  INPUT:  2  R  TB 

(1)  TOO  PANT  TABLE  ENTRIES 

(21  TOO  MANY  TABLE  CONTENTS  ENTRIES 

131  INCORRECT  KEYWORD 

63  ERROR  ON  CALL  TO  T ABLE- LOOK-UP :  1  TABLE 

(1)  TABLE  ARGUMENT  TYPE  CR  VALUE  TVPF  NOT  I,  R,  OR  ITABLE 

A. 

(2)  FUNCTION  TYPE  NCT  I  OR  S 

6*  FOR  A  FUNCTION  WHOSE  VALUES  CR  ARGUMENTS  ARE  1  TABLE 

ALPHAMERIC*  THE  GIVEN  ARGUMENT  DOES  NOT  MATCH  ITABLE 

A  TABULATED  ARGUMENT 

65  ERROR  IN  DATASET  INPUT:  ?  RDS 


(1)  TOO  MANY  OATASETS.  FILES.  OR  EXTENTS 

(2)  TCQ  MANY  FILES  ASSOCIATED  WITH  ONE  CATASET 

( 3 )  UNDEFINED  DATASET  TYPE 

(4)  PARAM  CAPO  SUPPLIED  WITH  SEQUENTIAL  DATASET 

(5)  INCORRECT  KEYWORO 


66  INVALID  DATASET  TYPE  -  DATASET  PRINT  OR  DUMP  0  PDS 

DOS 

67  INVALID  DATASET  TYPE  3  IDS 

66  ATTEMPT  TO  EXECUTF  PROCEDURE  STATEMENT  HAVING  A 

PREVIOUSLY  DETECTED  ERROR  I  EXPR 

65  INVALID  ERROR  COOF  5  MAIN 

70  CONTROL  CARO  DOES  NOT  HAVE  *,  OR  HAS  ILLEGAL  KEYWORD  2  MAIN 

71  SYSTEM  CANNOT  FIND  "SYNC"  OP  ASSOCIATED  WITH  A  SET 

OF  SYNCHRONIZED  OPS  4  EXPR 

72  ILLEGAL  CP  NUMBER  IN  PROCEDURE  4  EXPR 

IPR 

73  SAME  4  AUXPRI 

AUXPRE 

74  NOT  USED 

75  LIST  REFERRED  TO  IN  PROCEDURE  NCT  FOUND  0  IPR 

76  LABEL  REFERRED  TO  eY  PROCEDURE  NOT  FOUND  0  IPR 

77  LABEL  REFERREO  TO  BY  "SYNC"  NCT  FOUND  l  EXPR 

78  ERROR  IN  SEGMENT  INPUT:  2  RSG 

II)  TOO  MANY  SEGMENTS  OR  FIELOS 
(2)  INCORRECT  KEYWORD 
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79  REFERENCED  SUPERIOR  SEGMENT  DOES  NOT  EXIST 

AC  DATASET  REFERRED  TC  BY  SEGMENT  DOES  NOT  EXIST 

HI  NUM8ER-PER- SUPER  I OP- SEGMENT  FIELD  IN  SEGMENT  INPUT 
IS  £ERu 

B2  A  SEGMENT  IS  SUPERIOR  TO  ITSELF 

81  DISTRIBUTION  REFERRED  TO  BY  FIELD  CANNOT  BE  FOUND 

8*  DATASET  ASSOCIATEC  MTH  SFCCNDARY  INDEX  CANNOT  BE 

FOUND 

85  SEGMENTS  ON  SAME  CATASET  DC  NOT  HAVE  THE  SAME 
DATASET  MASTER  SEGMENT 

86  A  KEY  FIFLD  IS  IN  A  SEGMENT  WHICH  1$  NOT  A  DATASET 
“ASTER 

«7  there  are  more  than  one  sequential  key  fields  cn 

THE  SAME  OATASET 

88  ILLEGAL  type  IN  MASTER  -  SYSTEM  ERROR  WHICH  RESULTS 
ONLY  IN  GARBLED  CUTPUT 

89  ERROR  IN  QUALIFICATION  INPUT: 

1 1  )  TCC  MANY  QUERIES 
(2)  INCORRECT  KEYWCRO 

90  ILLEGAL  TYPE  -  CONVERT  ARGUMENT 

91  ILLEGAL  CHARACTER  IN  NUMERIC  VALUE 

92  ILLEGAL  RELATIONAL  OPERATOR  IN  QUALIFICATION 

93  FIELD  REFERRED  TO  BY  QUALIFICATION  DOES  NOT  EXIST 

94  SEGMENT  REFERRED  TC  BY  QUALIFICATION  DOES  NOT  EXIST 

95  QUALIFICATION  REFERRED  TO  BY  QUALIFICATION  DOES  NOT 
EXIST 

96  QUALIFICATIONS  FORM  A  CIRCULAR  DEFINITION 

97  QUALIFICATIONS  REFERRED  TO  RY  A  BOOLEAN 
QUALIFICATION  DC  ACT  QUALIFY  THE  SAME  SEGMENT 

98  "SEC  AND  THE  SEGMENT  QUALIFIED  BY  "Q3"  ARE  NOT 
LINEALLY  RELATED.  SEE  DESCRIPTION  OF  SEGMENT 
CUALIF ICATIQN. 

99  THE  "NO"  PARAMETER  IS  IMPROPERLY  SPECIFIED  IN  A 
SEGMENT  QUALIFICATION.  IT  MLST  BE  A  NON-NEGATIVE 
l NTEGER  NCT  GREATER  THAN  THE  NUMBER  OF  SEGMENTS 


3  ISG 
3  ISC 

3  ISG 
3  ISG 
3  I FD 

3  IFD 

3  ISG 

3  ISG 

3  ISG 

0  HEAD 

2  RQU 

4  CONVER 

3  CONVER 
3  IQU 

3  IQU 
3  IQU 

3  IQU 
3  IQU 

3  IQU 

3  IQU 
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IOC 

1C1 

102 

103 

104 

105 
1C6 

107 

108 
105 

lie 

111 

112 

113 

114 

115 

116 

117 

118 

115 

120 

121 


"SEG"  PER  SEGMENT  QUALIFIED  AY  "Q3".  SET  DESCRIPTION 
OF  SEGMENT  QUALIFICATION.  SECTION  3.3. 

01)41 1  F  I  CAT  ION  REFERRED  TO  8Y  LIST  (HASFD  CN  QUALIF¬ 
ICATION)  DOES  NGT  EXIST. 

TOO  MANY  ISAM  CATASETS 

SAME 

NOT  USED 


ccnthaoictgky  prime*  overflow,  and  track  index 
PARAMETFRS  SPECIFIED 


RANDOM  ACCESS  TO  NCN-R ANCCM  DATASET 


SAME 


NC  SP AC f  LEFT  IN  FILE  TABLE  FOR  FILE  CREATED  OY 
DEFAULT 


A  FILE  RECUIRES  TCC  MANY  EXTENTS 

OVERFLOW-CHAI N-L  ENCTH-OISTR IBUTION  PFFFRRED  TO  BY 
I SAw  PARAMETER  LIST  DOES  NOT  EXIST 

CATASET  REFERRED  TC  BY  ACCESS  OP  IN  PROCEDURE  DOES 
NOT  EXIST 

SAME  FOR  FIELD 

SAME  FOR  QUALIFICATION 

ACCESS  OP  IN  PROCEDURE  HAS  ILLEGAL  MODIFIER  FIELD 

NUMBER  CF  BUFFERS  IN  AN  "OPEN"  ACCESS  OP  IS  ILLEGAL 

INCORRECT  LOGICAL  FILE  FCP  OUTPUT  IN  PRINT*  DUMP,  CR 
TRACE  PROCEDURE  STATEMENT 

NO  LIST  SPECIFIED  IN  PRINT,  DUMP,  OR  TRACE  PROCEDURE 
STATEMENT 

TOO  MANY  TIMERS  RECUIRFD  BY  PROCEDURE 

ERROR  LABEL  SPECIFIED  BY  "ERROR"  PROCEDURE  STATEMENT 
OOFS  NOT  EXIST 

ATTEMPT  TO  UPOATE  A  DATASET  CF  A  TYPE  NOT  UPDATA8L E 

ATTEMPT  TO  CREATE  TOC  MANY  FILES  IN  ONE  DATASET 

EXTENT  BOUNDS  INCOMPATIBLE  WITH  NC.  OF  CYLINOFRS  ON 
THE  OEVICE 


3  I  L  I 
6  RX  IS 
3  I X  l  S 

3  IXIS 
1  RFAOP 
1  WR1TR 

3  CREATE 
3  ALLCC 

3  IXIS 

0  IPR 
0  IPP 
0  IPR 
0  IPR 
1  F  X  PR 

0  I  DR 

0  IPR 
0  IPR 

0  IPR 
l  EXPR 
3  IXIS 

3  ALLOC 
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3  CREATD 


1 22  DISTRIBUTION  table  full 

123  ATTEMPT  TO  CHEATS  TOO  MANY  DEFAULT  DISTRIBUTIONS 

124  ATTEMPT  TO  "PUTD"  INTO  DISTRIBUTION  MILL  INVALID 
MODE  SPECIE ICAT ION 

12*)  CUT  OF  DISTRIBUTION  CONTENTS  SPACE 

12E  ILLFGAL  "PUTD"  -  DISTRIBUTION  ALREADY  CREATED 

12?  RESERVED 

1 2  E  RESFHVEC 

12*3  RESERVED 

1 3  C  RESERVED 

131  RESFRVFO 

132  RESERVED 

133  RF  SERVED 

134  NUMBER  OF  OUTSTANDING  REOUFSTS  CN  A  DIRECT  ACCESS 
DATASET  EXCEEDS  THE  NUMBER  OF  BUFFERS  AVAILABLE 

135  SAME 

136  A  wAIT  HAS  BEEN  ISSUED  CN  A  FILE  WHICH  IS  NOT  OPEN 

137  RESERVED 

138  RESERVED 


3  CRFATO 

4  PUTD 

3  PUTD 

4  PUTD 


1  RFADD 
1  HR  I  TO 
1  MAI  TO 
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6.C 


MOOCc 


SIZE  LIMITATIONS 


THE  FOLLOWING  ARE  THE  BUILT-IN  LIMITATIONS 
TABLE  SIZES: 

TO  THF  SYSTEM  DUE 

element 

MAXIMUM  NUMBER 

CPEN  FILES 

2D 

LISTS 

30 

DEFAULT  CREATED  LISTS 

20 

LITERAL  LIST  CONTENTS  VALUES 

400 

STEPS  IN  PROCEDURE 

SO 

PENDING  I/C  RECUESTS 

20 

DISTRIBUTIONS 

30 

DISTRIBUTION  ( ARGtVAL )  PAIRS 

400 

CHANNELS 

8 

CONTROL  UNITS 

10 

DEVICES 

30 

CONTROL  UNITS  ON  ONE  CHANNFL 

10 

channels  attached  to  one  control  unit 

4 

TABLES 

20 

TABLE  (ARG.VAL)  pairs 

300 

datasets 

20 

FILES 

30 

extents 

200 

FILES  PER  DATASET 

10 

SEGMENTS 

20 

FIELDS 

BO 

QUALIFICATION  SPECIFICATIONS 

30 

TIMERS 

10 

DEVICES  PER  CONTROL  UNIT 

10 

CfcFAULT  CREATED  DISTRIBUTIONS 

20 

rc 
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r.c 


ACCESS  METHODS 


EACH  DATASET  TYPE  (E.  G. ,  "S",  "D",  "IS")  HAS  A  CORRESPONDING 
"ACCESS  METHOD"  WHICH  IS  EMBODIED  AS  A  PROGRAM,  AND  DESCRIBES 
ACCESSING  OPERATIONS  ON  THE  CATASET.  FOR  EXAMPLE,  A  "SEQUENTIAL" 
OR  "S"  TYPE  DATASET  IS  ACCESSED  USING  THE  "SEQUENTIAL  ACCESS 
METHOD". 

EACH  CATASET  CONSISTS  OF  CNE  OR  MORE  "ELEMENTARY  FILES" 

I  HERE  I N AFT  £R  CALLED  "FILES"),  ANC  IT  IS  THE  RESPONSIBILITY  OF  Th£ 
ACCESS  METHOD  TO  RELATE  THESE  FILES,  AND  DESCRIBE  THE  OPERATIONS 
WHICH  MUST  BE  PERFORMED  ON  THEM  IN  ORDER  TC  RETRIEVE  A  GIVEN 
"LOGICAL  RECORD"  CF  THE  DATASET.  FOR  EXAMPLE,  A  RFQUEST  FOR 
RECORD  123  OF  AN  INDEXED  DATASET  MIGHT  REQUIRE  ACCESSES  TC  A 
CYLINDER  INDEX  FILE,  A  TRACK  INDEX  FILE,  AND  THE  PRIME  DATA  FILE 
IN  ORDER  TO  SATISFY  THE  REQUEST. 

THE  FILES  OF  A  DATASET  CAN  BE  SPECIFIED  BY  THE  MODELER  AS  PART 
OF  HIS  MODEL  INPUT,  OR  CAN  BE  SUPPLIED  BY  THE  ACCESS  MFTHOD  TO 
THE  EXTENT  THAT  IT  IS  PROGRAMMED  TC  PROVIDE  DEFAULT  FILES. 

THE  THREE  ACCESS  METHODS  DESCRIBED  HEREIN  HAVE  BEEN  INCLUDED  TO 
ILLUSTRATE  A  RANGE  OF  "SOPHISTICATION"  OF  ACCESS  METHODS.  THEY 
ARE  INTENTIONALLY  SIMILAR  TC  THREE  IBM  OS/360  ACCESS  METHODS: 

III  BASIC  DIRECT  ACCESS  METHOD  1  BO AM  1,(2)  QUEUED  SEQUENTIAL 
ACCESS  METHOD  (QSAM) ,  AND  (3)  BASIC  INDEX  SEQUENTIAL  ACCESS 
METHOD  (BISAM).  THEY  REQUIRE  165,  267,  AND  817  LINES  OF  FORTRAN 
CODE.  RESPECTIVELY,  TO  IMPLEMENT. 

FOR  The  DISCUSSION  THAT  FOLLOWS,  IT  IS  ASSUMED  THAT  THE  USER  IS 
FAMILIAR  WITH  THE  BASIC  CONCEPTS  OF  THE  0S/360  ACCESS  METHODS 
UNDER  DISCUSSION,  AS  DESCRIBED  IN  IBM  PUBLICATION  IC28-6646), 
"IBM/3«jO  CPERATING  SYSTEM,  SUPERVISOR  AND  DATA  MANAGEMENT 
SERVICES". 


7.1 


ACOINC  ACCESS  MFTHOOS  TC  PHASE  tl 


FOLLOWING  ARE  SOME  RRIEf  NOTES  ON  ADDING  ACCFSS  METHODS  TO 
PHASE  II.  IN  GENERAL,  THE  FOLLOWING  STEPS  WILL  BF  NEEDED: 

(II  MODIFY  DATASET  ROUTINES  ("CS"  MODULE)  TO  RECOGNIZE  A  NEW 

CATASET  TYPE,  AN C  CODF  CALLS  TO  USER-SUPPL I FD  "READ",  "PRINT", 
AND  "DUMP"  ACCFSS-M2THCD-RELATEC  PARAMFTERS  ROUTINES.  IF  ThFPE 
ARE  NO  ACCESS- RE  THOD-REL  ATFf)  PARAMETERS,  THIS  STFP  May  wf 
OMITTED. 

(2)  MODIFY  PROCEDURE  ROUTINES  I"RPR",  "EXPR",  AND  "AUXPR"  MODUL-s: 
TO  RECOGNIZE  NFw  PROCEDURE  OP  CODES  TO  BE  INTRODUCED  (IF  ANY), 
AND  PRCVIOE  APPROPRIATE  PROCEDURE  STATEMENT  INTERPRETATION  AND 
EXECUTION  SECTIONS. 

(3)  WRITE  SEVERAL  SUBROUTINES: 

READ  ACCESS-METHOD-REL ATEO  PARAMETERS  (IF  ANY) 

PRINT  "  "  " 

DUMP  «  «  " 

I NTERPKF  T  CATASET;  THAT  IS,  SET  UP  THE  DATASET  FILES  TO 

REFLECT  THEIR  INTERPRETATION  FOR  THIS  DATASET  ORGANIZATION. 
THIS  MIGHT  INCLUOE  THE  CREATION  OF  DEFAULT  FILES  NEEDED  TP 
COMPLETELY  SPECIFY  THE  DATASET.  FOR  A  UNE-FILE  DATASET,  THIS 
ROUTINE  IS  NCT  NECFSSARY. 

EXECUTION  ROUTINES  (CALLED  BY  "EXPR"  CR  "AUXPRE")  WHICH 
SIMULATE  THE  EXECUTION  OF  THE  ACCESS  METHOD. 

DETAILS  ABCUT  THE  PRECEDING  CAN  BEST  BE  OBTAINED  BY  EXAMINING  THE 
SAMPLE  ACCESS  MET HPnr  WHICH  HAVE  BEEN  PROVIDED  WITH  THE  SYSTEM , 

AND  WHICH  COVER  A  RANGE  OF  ACCESS  METHOD  COMPLEXITY. 


7.2  BASIC  DIRECT  ACCESS  METHOD  -  TYPE  "D"  DAT  ASF  TS 

A  DIRECT  ACCESS  DA (ASET  RFQUIRES  ONLY  ONE  FILE  TO  REPRESENT  IT. 
ANY  RCr  ORD  Of  THE  FILE  MAY  BF  REACHED  DIRECTLY  BY  ADDRESS.  AFTER 
A  RfcAC  J°  WHITE  IS  INITIATED,  CONTROL  IS  RETURNED  TO  THE  USER, 
WHO  MUST  ISSUE  A  SUBSEQUENT  "WAIT"  TC  ENSURE  THAT  I/O  HAS 
COMPLETED.  "RE ADO"  ANO  "WRITO"  ARE  ALLOWABLE  ACCESS  OPS  FOR 
DIRECT  DATASETS. 


7.3  QUEUED  SEQUENTIAL  ACCESS  METHOD  -  TYPE  «S"  DATASETS 

A  SEQUENTIAL  DATASET  AlSO  REQUIRES  ONLY  ONE  FILE  TO  REPRESENT  !T. 
A  GIVEN  RECORD  OF  THE  DATASFT  IS  ACCESSED  BY  PACING  THROUGH  THE 
FILE,  RECORD  BY  RECORD,  UNTIL  THE  REQUESTED  RECORD  IS  REACHED 
(STARTING  WITH  RECORD  ONE,  IE  THE  FILE  IS  NOT  ALREADY  OPEN). 
SEQUENTIAL  ACCESS  ASSUMES  SEQUENTIAL  PROCESSING,  SO  RECORDS  IN 
ADVANCE  OF  THE  REQUESTED  RECORD  ARE  ALSO  READ  IN,  IN  ANTICIPATION 
OF  UPCOMING  REQUESTS.  CCNTROL  IS  NOT  RETURNED  TO  THE  USER  UNTIL 
THE  RECORD  REQUESTED  HAS  BEEN  RE  AO  IN.  BUFFERING,  IN  BOTH  THE 
"MOVE"  ANO  "LOCATE''  MODE  ARE  PROVIOED.  "R  E  AOS" »  "WRITS"  AND 
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"  U  P  C  A  T  F  '*  ARF  ALLOWABLE  t  ..ESS  CPS  FOR  SEQUENTIAL  DATASETS.  WITH 
UPDATE  ALLOWABLE  CMV  EL  OATASETS  BEING  RF  AO  IN  LOCATF  HOOF. 


BASIC  INDEX  SEQUENTIAL  ACCESS  MFTHOn  -  TYPE  "IS”  OATASETS 

INDEX  SEQUENTIAL  DATASETS  ALLOW  FOR  RANDOM  RETRIEVAL  OF  RECORDS 
UN  THE  BASIS  OF  KEY  VALLE.  AN  "IS"  DATASET  ALSO  ALLOWS  FOR 
INSERTION  UE  NEW  RECORDS  m  I THOUT  REWRITING  THE  WHOLE  DATASET. 

THtSE  FUNCTIONS  ARF  AC CC MPL  I  SHE  0  BY  THE  USE  OF  HIERARCHICAL 
INDEXES  AND  OVERFLOW  AREAS. 

TABLE  SIZES  RESTRICT  "IS"  DATASETS  TO  A  TOTAL  OF  10. 

AN  "IS"  DATASET  LCNTAINS  SEVERAL  FILES,  OF  THE  FOLLOWING  TYPFS 
AND  DEFINITIONS  ("TYPE"  IS  A  PARAMETER  OF  A  FRF  DESCRIPTION,  SFF 
SECTION  3.0: 

PR  -  PRIME  ERE.  the  NQN-0 VERF LCW  DATA  RECORDS  ARE  STORED  ON 
THIS  FILE. 

TI  -  TRACK  INCEX  FILE.  THIS  FILE  SHARES  EXTENTS  wl TH  THE  PRIME 
FILE.  THAT  IS,  FACE  PRIMF  CYLINDER  ALSO  CONTAINS  ONE  OR 
MORE  TRACKS  OF  TRACK  INCFX.  FOR  EACH  TRACK  OF  PRIME  FILE 

on  a  cylinder,  there  are  two  rfccrds  on  the  track  index, 

ONE  POINTING  TO  THE  PRIME  TRACK  WHOSE  HIGHEST  KEY  IS  THE 
KEY  CF  THE  RECORD,  AND  ONE  POINTING  TO  THE  OVERFLOW  CHAIN 
FOR  THAT  TRACK,  IF  ANY. 

CF  -  OVERFLOW  FILE.  THIS  FILE  CONTAINS  THE  OVERFLOW  RECORDS 
FOR  THE  OATASET.  IT  MAY  SHARE  EXTFNTS  WITH  THE  "PR"  AND 
"TI"  FILES,  IN  WHICH  CASE  ALL  THE  OVERFLOW  RECORDS  FOR 

a  cy:  inder  are  stored  on  thf  cylinder  itself,  on  tracks 
KESFE  ;ED  for  that  cse,  CR  IT  MAY  be  CONTAINED  IN'  AN 
INULPFNDENT  OVERFLOW  AREA.  THF  FORMER  IS  THF  DEFAULT 
OPT  ION, 

ci  -  cylinder  index  file,  this  is  a  separate  file,  containing 

ONE  RECORD  FOR  EACH  CYLINDER  OCCUPIED  BY  THF  PRIME  FILE. 
EACH  RECORD  POINTS  TO  THE  CYLI-NDER  WHOSE  HIGHEST  KEY 
IS  THE  KEY  FIELD  OF  THE  RECORD. 

MU  MASTER  INDEX  FILES.  "MU"  INDEXES  THF  CYLINDEP  INDEX, 

MI  2  CNE  RECOPC  PER  TRACK  CF  THE  "CI".  FACH  RECORD  POINTS  TO 
MI3  THE  TRACK  OF  THE  "Cl"  WHOSE  HIGHEST  KEY  IS  THE  KEY  FIELD 
OF  THF  RECORC.  IN  A  SIMILAR  WAY,  "MI2"  INDEXES  "Mil", 

AN0  "M 13"  INDEXES  "MI2".  THE  CREATION  OF  MASTER  INDEXES 
IS  CONTROLLED  BY  THE  "NMI"  PARAMETER  (SEE  BELOW). 


CONTENTS  of  I.NCfcX 
POINTERS,  AND  BY 
ALL  OF  THE  FILES 
SPECIFICATION  BY 


FILE  RECORDS  ARE  OIRECT  ACCESS  DEVICE  ADDRESS 
DEFAULT  ARE  ASSUMED  TO  BF  10  BYTES  LONG.  ANY  Oft 
OF  AN  "IS"  DATASET  MAY  BF  LEFT  TO  DEFAULT 
THE  ACCESS  METHOD. 
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FOR  AN  "15"  DATASET, 
NEEDED  TO  COMPLETELY 
IN  A  "PARAM"  CARD  OF 


SEVERAL  ACCESS  METHOD  DEFINED  PARAMETERS  ARE 
CHARACTERIZE  THF  DAT  ASF  T .  THESE  ARE  CONTAINED 
THE  FOLLOWING  FORM  (SEE  SECTION  3.4): 


PAfiAM  PPF.NMI .KL.CLD 

AND  ARE  OEFINEC  AS  FOLLOWS: 


IP  ARM  DEFINITION  DEFAULT 

PPF  FRACTION  QF  PRIME  AREA  FULL.  FOR  EXAMPLE,  1.0 

PPF  =  L.O  MEANS  THAT  THE  PRIME  AREA  IS  FULL,  AND 
NC  RECOROS  ARE  STORED  IN  OVFRFLOW,  PPF  =  1.1 
MEANS  THAT  THE  PRIME  AREA  IS  FULL,  AND  THERE  ARE 
AN  EQUIVALENT  OF  1  Ot  (IF  THF  PRIME  RECORDS  STORED 
IN  OVERFLOW,  ANC  PPF  =  .9  MEANS  THAT  THPRE  IS  NO 
OVERFLOW,  AND  THE  PRIME  AREA  IS  ONLY  90*  FULL 
(WITH  THE  "HOLES"  DISTRIBUTED  EVENLY  THROUGHOUT 
THE  PRIME  AREA). 


NM[  NUMBER  QF  TRACKS  OF  A  MASTER  INDEX  TO  BE  NO  MI'S 

ALLOWED  BEFORE  A  HIGHER  LEVEL  MASTER  INDEX  IS 
CREATFD 


KL  KEY  LENGTH  10 

CLD  OVERFLOW  CHAIN  LENGTH  DISTRIBUTION.  THE  OVERFLOW 
RECOROS  FOR  A  TRACK  ARE  CHAINED  TOGETHER ;  HENCE 
WHEN  AN  OVERFLOW  KFCORD  IS  TO  BE  ACCESSED,  ONE 
CR  MORE  RECORDS  IN  THE  OVERFLOW  AREA  WILL  BE 
READ  UNTIL  THE  REQUESTED  RECORD  IS  REACHED.  THE 
USER  MAY  CONSTRUCT  A  DISTRIBUTION  OF  CHAIN 
LENGTHS  II.  E.,  THE  TOTAL  NUMBER  OF  OVERFLOW 
RECORDS  TO  BE  ASSOCIATED  WITH  EACH  TRACK)  AND 
PLACE  ITS  NAME  IN  THIS  FIELD,  OR  OMIT  THE  FIELD 
AND  ALLOW  THE  ACCESS  METHOD  TO  CONSTRUCT  THF 
"NATURAL  DISTRIBUTION"  PIN),  DEFINFO  IN  SECTION  12.2. 


THE  "IS"  ACCESS  METHOD  PROVlOFS  FOR  RANDOM  READING,  WRITING,  AND 
UPDATING  CF  RECORDS  tN  AN  "IS"  DATASET  BY  MEANS  OF  THF  "READR", 
"WRITR",  ANO  "UPDATE"  ACCESS  OPS,  RESPECTIVELY. 

A  RANDOM  RE  AO  IS  CARRIED  OUT  AS  FOLLOWS: 

Ill  READ  DOWN  HIGHEST  LEVEL  "MI"  SEQUENTIALLY  UNTIL  THE  DESIRED 
KEY  IS  "BRACKETED" 


(2) 


CONTINUE  TO  READ  LOWER  LEVELS  CF  MASTER  INDEX  AND  CYLINDER 
INDEX  TO  LOCATE  CYLINDER  ON  WHICH  THE  RECORD  IS  LOCATED 


(3)  READ  "TI"  TO  FINO  TRACK  ON  WHICH  THE  RFCORD  EXISTS,  OR  IF  IT 
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IS  AN  OVERFLOW  RECORD,  OBTAIN  THE  ADDRESS  OF  THE  FIRST  RECORD 
IN  THE  OVERFLOW  CHAIN  FOR  THE  TRACK 

( A 1  READ  THE  PRIME  RECORD,  OR  CHAIN  OF  OVERFLOW  RECORDS  UNTIL  THF 
REUU1 RED  RECORD  IS  FOUND 

A  RANDOM  WRITE  (ASSUMED  TO  BE  AN  INSERT)  IS  CARRIED  OUT  AS 
FOLLOWS: 

(1)  (lt-U)  AS  IN  RANDOM  READ 

IF  THF  RECORD  IS  TC  BE  INSERTf  D  IN  AN  OVERFLOW  CHAIN: 

<S>  WRITE  A  NEw  RECORD  AT  THE  ENC  CF  THE  OVERFLOW  ARFA,  AND 

REWRITE  THE  NEXT-TO-LAST  OVERFLOW  RFCGRD  R E AO  TO  UPDATE  ITS 
CHAIN  POINTER 

IF  THE  RECORD  IS  TO  BE  INSERTED  IN  THE  PRIME  AREA: 

(6)  RE-WRITE  THE  LAST  BLOCK  RFAD,  READ  AND  WRITE  THE  REMAINING 

BLOCKS  ON  THE  TRACK 

(61  REWRITE  BOTH  TRACK  INDEX  RECORDS  FOR  THIS  TRACK 

(7)  WRITE  AN  OVERFLOW  RECORD  AT  THE  END  OF  THE  OVERFLOW  ARFA 

A \  UPDATE  FOLLOWING  A  READ  MERELY  REWRITES  THE  LAST  BLOCK  READ, 
WITH  NO  INDEX  SEARCH  RFCUIREO. 


8.0 


i/c  supervisor 


The  I/O  SUPERVISOR  IS  A  PROGRAM  (MODULE  "AC"  I  WHICH  ACCEPTS  I/O 
REQUESTS,  MARSHALS  THE  REQUESTS  THROUGH  VARIOUS  QUEUES.  AND  SEES 
THEM  THROUGH  TO  COMPLETION.  IT  MAINTAINS  THE  CLOCK  AND  THE 
HARDWARE  DEVICES  (THEIR  RUTATIONAL  DISPLACEMENT,  ACCESS  ARM 
POSITION,  AND  STATUS,  SUCH  AS  DEVICE  SUSY,  CHANNEL  BUSY,  ETC.). 

IT  IS  IMPLEMENTED  AS  A  SIMPLE  EVENT  DRIVEN  QUEUING  MODEL,  IN  WHICH 
THE  "STATIONS"  ARE  DEVICES,  CHANNELS,  AND  THE  CPU,  AND  THE 
"EVENTS"  ARF  BEGIN  AND  END  SEEK,  BEGIN  AND  END  DATA  TRANSMISSION, 
AND  BEGIN  AND  END  CPU  PROCESSING.  IT  HAS  ENTRY  POINTS  "AC", 

"WAIT",  AND  "PRQC",  AS  CESCR I3E0  IN  SECTION  <5.3. 

REQUESTS  FOR  SEEKS  CR  TRANSMITS  ARE  QUEUED  UP  ON  DEVICES  AND 
CHANNELS,  RESPECTIVELY.  DEVICES  WHICH  ARE  SEEKING  ARE  CHAINED 
TOGETHER  IN  A  DEVICE  "T I ME-CF-CQMPLE T ION"  <TC)  CHAIN,  IN  ORDtR 
CF  COMPLETION.  IN  A  SIMILAR  WAY,  TRANSMITTING  CHANNELS  ARF  TIED 
TOGETHER  IN  A  CHANNEL  TC  CHAIN. 

CONTROL  UNITS  ARE  SWI TCHABLE  BETWEEN  CHANNELS,  BUT  ONCE  A  CONTROL 
UNIT  IS  "ATTACHED"  TC  A  CHANNEL  (BY  BEING  USED  TO  ISSUE  A  SEEK), 

IT  REMAINS  ATTACHED  UNTIL  THE  REQUEST  IS  COMPLETED,  THAT  IS,  UNTIL 
END  OF  DATA  TRANSMISSION. 

THE  FOLLOWING  IS  A  DESCRIPTION  OF  THE  LOGIC  CF  THE  I/C  SLPERVISOP, 
DESCRIBING  WHAT  HAPPENS  WHEN  AN  I/O  REQUEST  IS  RECEIVED  BY  THE 
SYSTEM,  AND  wHAT  HAPPENS  WHEN  THE  VARIOUS  TYPES  OF  FVENT  OCCUR. 


8,1  I/O  RFUUEST 


(I)  PLACE  REQUEST  IN  REQUEST 
12)  ATTACH  REQUEST  TO  DEVICE 
(3)  IF  DEVICE,  CONTRCL  UNIT, 


TABLE 

QUEUE 

CHANNEL  FREE, 


START  SEEK 


6.2  START  SEEK 


(1)  COMPUTE  OEVICE  TC  (END  SEFK) 

(21  PLACE  DEVICE  IN  DEVICE  TC  CHAIN 
(3)  MAKE  DEVICE  BLSY 

14}  ATTACH  CONTROL  UNIT  TO  CHANNEL,  INCREMENT  CU  USE  COUNTER 


8.3  END  SEEK 

( 1)  UPDATE  THE  CICCK 

(2)  DETACH  REQUEST  FROM  DEVICE  QUEUE 

(3)  ATTACH  REQUEST  TO  CHANNEL  QUEUE 

(4)  REMOVE  OEVICE  FRCM  OEVICE  TC  CHAIN 

(5)  IF  CU  AND  CHANNEL  ARE  FREE,  START  TRANSMIT 


8.4  START  TRANSMIT 
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( 1 )  COMPUTE  CHANNEL  TC 

(2)  PLACE  CHANNEL  IN  CHANNEL  TC  CHAIN 

(3)  MAKE  CHANNEL  BuSY 
(A)  PANE  CU  BLSY 


P.5  END  TRANSMIT  (NGN  FORMAT  WRITE) 

(  L )  UPDATE  THF  CLOCK 

(2)  SIGNAL  I/C  COMPLETION  TP  REQUESTING  PROGRAM 

(3)  DETACH  REQUEST  ERCM  CHANNEL  i.UEUF 
(A)  REMOVE  CHANNEL  F R CM  CHANNEL  TC  CHAIN 

(5)  FREE  CHANNEL 

(6)  FREE  CEVICE  ANC  CU 

(7)  DECREMENT  CU  USE  COUNTER.  IE  ZERO,  DETACH  CU  FROM  CHANNEL. 
(3)  REMOVE.  REQUEST  ERCM  REQUEST  TABLE 

(9)  FOR  FREE  CU'S  CN  THIS  CHANNEL  (BUT  NOT  CURRENTLY  ATTACHED  TO 
ANOTHER  CHANNEL)  START  SEEKS  ON  FREE  DEVICES 
(1C)  IF  THIS  CHANNEL  HAS  A  TRANSMIT  WAITING,  START  TRANSMIT 


rt.6  END  TRANSMIT  (FORMAT  WRITE)  (NOT  IMPLEMENTED) 

( li-(S)  as  in  9.5  m-<5) 

(6)  ATTACH  REQUEST  TO  DEVICE  QUEUE,  FIRST  IN  LINE 

(7)  COMPUTE  DEVICE  TC  FCR  TRACK  ERASE 
<P)  PLACE  DEVICE  IN  DEVICE  TC  CHAIN 
(9>-< 1C)  AS  IN  9.5  (9)-( 10) 


E.7  END  TRACK  ERASE  (FORMAT  WRITE)  (NOT  IMPLEMENTED) 

(  I  I  UPDATE  THE  CLOCK 
( 2  •  CETACH  RECUEST  ERCM  DEVICE  QUEUE 
(3)  REMOVE  OF V 1 CE  FROM  DEVICE  TC  CHAIN 
(A)-(t)  AS  IN  8.5  16  )— I  8 ) 

(7)  IF  A  CHANNEL  IS  AVAILABLE,  FOR  EACH  FREE  DEVICE  ATTACHED  TO 
THE  CU  WITH  PENDING  SEEKS,  START  SEEKS 
(9)  IF  A  TRANSMIT  FOR  A  CEVICE  ON  THIS  CU  IS  WAITING  ON  THE 
CHANNEL  TC  WHICH  THIS  CU  IS  ATTACHED,  START  TRANSMIT 


VII -71 


9.0 


ORGANIZATION  CF  THE  PHASE  II  SYSTEM 


this  section  constitutes  a  primer  on  the  implementation  of  the 

PHASE  II  SYSTEM,  DESCRIBING  TABLES,  SUBROUTINES,  ANO  FLOW  OF 
CONTROL  IN  THE  SYSTEM.  SECTIONS  9  £  10,  TOGETHER  WITH  THE  PROGRAM 
LISTINGS  THEMSELVES,  SHOULD  PROVIDE  A  COMPLETE  DOCUMENTATION  OF 
THE  SYSTEM. 


9.1  TABLES 

A  BASIC  PROGRAMMING  DEVICE  CF  THE  PHASE  II  SYSTEM  IS  THE  "TABLE", 
OF  WHICH  THERE  ARE  APPROXIMATELY  FOURTEEN.  EACH  TABLE  CONTAINS 
INFORMATION  ABOUT  ALL  SYSTEM  ENTITIES  OF  A  PARTICULAR  TYPF .  FOR 
EXAMPLE,  THE  DEVICE  TABLE  CARRIES  DESCRIPTIUNS  OF  EACH  OF  THE 
DEVICES  WHICH  A  MODELER  HAS  SPECIFIED  FOR  THE  SYSTFM  TO  BE 
MODELFO .  SECTION  3  CONTAINS  DEFINITIONS  OF  TABLE  INPUT  PARAMETERS, 
ANO  SECTION  10  DEFINES  INTERNAL  TABLE  PARAMETERS.  THE  PURPOSE  OF 
THIS  SECTION  IS  TO  DESCRIBE  HOW  THE  TABLES  ARE  IMPLEMENTED. 

A  TABLE  IS  IMPLEMENTED  AS  AN  ARRAY,  IN  WHICH  THE  ROWS  REPRESENT 
ENTITIES,  AND  THE  COLUMNS  REPRESENT  ATTRIBUTES  OF  THE  ENTITIES. 
HOWEVER,  BY  USING  THE  FORTRAN  "EQUIVALENCE"  SPECIFIER,  EACH 
COLUMN  (ATTRIBUTE)  MAY  BE  ADDRESSED  AS  A  ONE-DI MENS IONAL  ARRAY, 
WITH  THE  SUBSCRIPT  REPRESENTING  THE  SERIAL  NUMBER  OF  THE  F  NT  I TY 
UNDER  CONSIDERATION. 

EACH  TABLE  HAS  A  UNIQUE  T WO-CH AR AC T ER  IDENTIFIER.  FOR  EXAMPLE. 

THE  IDENTIFIER  OF  THE  DEVICE  TABLE  IS  "DV".  SIMILARLY,  EACH 
ATTRIBUTE  HAS  A  l-TC-4  CHARACTER  IDENTIFIER.  THUS  THE  DEVICE 
"TYPE"  ATTRIBUTE  IS  IDENTIFIED  BY  THE  CHARACTERS  "DVTYPE",  ANO 
THIS  IS  THE  INTERNAL  NAME  CF  THE  DEVICE  TYPE  VECTOR.  THE  DEVICE 
TYPE  OF  DEVICE  NUMBER  12  IS  THEREFORE  GIVEN  BY  "DVTYPE ( 12 ) 

A  TABLE  MAY  REFERENCE  AN  ENTITY  IN  ANOTHER  TABLF.  FOR  FXAMPLE, 

ONE  OF  THE  ATTRIBUTES  OF  A  DEVICE  IS  A  SPECIFICATION  OF  THE 
CONTROL  UNIT  UN  THE  CONTROL  UNIT  TABLE)  IT  IS  ATTACHED  TO.  SUCH 
AN  ATTRIBUTE  TAKES  THE  FCRM  OF  A  POSITIVE  INTEGER  "INDEX  POINTER", 
ANO  IS,  IN  EFFFCT ,  A  ROW  NUMBER  IN  ANOTHER  (OR  THE  SAME)  TABLE. 
THUS,  IE  THE  "CONTROL  UNIT"  ATTRIBUTE  OF  A  DEVICE  IS  "5",  THE 
DEVICE  IS  ATTACHED  TO  THE  CONTROL  UNIT  WHOSE  ATTRIBUTES  ARE  GIVEN 
IN  ROW  5  OF  THE  CONTROL  UNIT  TABLE.  IF  ONF  WISHED  TO  KNOW  WHETHER 
OR  NOT  THAT  CONTROL  UNIT  WERE  BUSY,  A  TEST  WOULD  BE  MADE  ON 
"CUBUSY I  5 ) " .  BY  APPLYING  THE  APPROPRIATE  LCGIC,  THEREFORE,  ONE 
CAN  FIND  HIS  WAY  AROUND  THE  TABLES  AND  EXPLORE  THE  RELATIONSHIPS 
AMONG  THE  ENTITIES  THEREIN. 

OCCASIONALLY,  AN  ATTRIBUTE  MAY  CONTAIN  "REPEATING  INFORMATION" 
ABOUT  AN  ENTITY.  FOR  EXAMPLE,  AN  ATTRIBUTE  OF  A  CONTROL  UNIT  1$ 

A  LIST  OF  DEVICES  ATTACHED  TO  IT.  SUCH  AN  ATTRIBUTE  OBVIOUSLY  RE¬ 
QUIRES  MORE  THAN  ONE  STORAGE  LOCATION  TO  SPECIFY  IT.  IT  IS 
STORED  IN  MULTIPLE  ADJACENT  COLUMNS  OF  THE  TABLE,  AND  A  DOUBLE 
SUBSCRIPT  CONVENTION  IS  USED  TO  ADDRESS  IT.  FOR  EXAMPLE,  THF 
FOURTH  DEVICE  ATTACHED  TO  CONTROL  UNIT  «J"  WOULD  BE  AODRESSEO  BY 
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"CUDVI J»4)» 


ALL  CF  THE  PROCESSING  ON  A  GIVEN  TABLE  IS  PERFORMED  IN  ONE  MODULE 
(A  SEPARATELY  COMPILED  PROGRAM ) .  FOR  EXAMPLE,  THE  MODULE  "L!"  HAS 
ROUTINES  FOR  READING,  INTERPRETING,  PRINTING,  AND  DUMPING  THE 
LIST  TABLES. 


IN  THE  DISCUSSION  THAT  FOLLOWS ,  "XX"  WILL  BY  USED  AS  AN  ARBITRARY 
TABLE  IDENTIFIER. 


THERE  ARE 
FOLLOWS: 

FOUR 

SCALAR 

PARAME 

TER 

MAXXX  - 

THE 

MAXIMUM 

NUMBER 

CF 

MAXAXX  - 

THE 

MAXIMUM 

NUMBER 

OF 

NX  X 

THE 

CURRENT 

NUMBER 

CT 

NBXX  - 

THE 

•  XX  • 

NUMBER 

0E  PRE- 

DEF 

S  ASSOCIATED  WITH  EACH  TABLE,  AS 


ENTITIES  TABLE  "XX"  MAY  DESCRIBE 
ATTRIBUTES  PER  ENTITY 
ENTITIES  IN  THF  TABLE 
INED  (BUILT-IN)  ENTITIES  IN  TABLE 


EACH  TABLE  IS  ACCOMPANIED  BY  A  "TABLE  MASTER",  WHICH  DESCRIBES  THE 
TABLE.  A  MASTER  HAS  THREE  ROWS,  AND  EACH  COLUMN  CONTAINS  THREE 
ITEMS  OF  INEGRMAT I CN  ABOUT  AN  ATTRIBUTE  IN  THE  TABLE,  NAMELY,  ITS 
NAME,  ITS  TYPE,  ANC  THE  COLUMN  IT  OCCUPIES  IN  THE  TABLE  (THE 
ARRAY  COLUMNS  ARE  NOT  NECESSARILY  IN  THF  SAME  ORDER  AS  THE  MASTER 
COLUMNS).  THE  NAME  IS  THE  ATTRIBUTE  IDENTIFIER  DESCRIBED  ABOVE, 


AND  IS 

ALSO  THE  NAME  USED  AS  A  COLUMN  HEADING  WHEN  THE 

TABLE  IS 

PRINTED 

OUT.  THE  ENTITY  TYPE  IS  ONE  OF  THE  FOLLOWING: 

I  - 

INTEGER 

A  - 

ALPHAMERIC  IUP  TO  FOUR  CHARACTERS) 

R  - 

REAL  (FLOATING  POINT) 

0  - 

ALPHAMERIC  (UP  TO  EIGHT  CHARACTERS  -  STORED  IN 
COLUMNS) 

ADJACENT 

BLANK 

-  SECONC  COLUMN  OF  A  TYPE  "C"  ATTRIBUTE,  OR 
COLUMNS  OF  A  REPEATING  ATTRIBUTE 

SUCCEEDING 

LI  - 

INTEGER  LIST 

LR  - 

REAL  LIST 

LA  - 

ALPHAMERIC  LIST 

THE  LAST  THREE  TYPES  ARE  "P SEUOO- T YPES" ,  USED  ONLY  TO  INDICATE 
THAT  A  LITERAL  LIST  IS  OPTIONAL  FOR  THIS  FIELD  ON  INPUT.  AFTER 
INPUT,  IT  IS  TREATEO  AS  A  TYPE  "A"  ENTITY  (THE  NAME  OF  THE  LIST). 


THERE  ARE  THREE  SCALAR  PARAMETERS  ASSOCIATED  WITH  EACH  MASTER 
TABLE,  AS  FOLLOWS; 


NI  XX 
NPXX 

NT)  X  X 


THF  NUMBER 

OF 

INPUT  “ARAMETERS 

FOR  TABLE  " 

XX" 

THE  NUMBER 
PRINTED 

OF 

PARAMETERS 

TO 

BE 

OUTPUT 

WHEN 

THE 

TABLE 

IS 

THE  NUMBER 
CUMPED 

OF 

PARAMETERS 

TO 

BE 

OUTPUT 

WHEN 

THE 

TABLE 

IS 

IN  THE  TABLE  MASTER,  INPUT  PARAMETERS  ARE  LISTED  FIRST,  FOLLOWED 
BY  ADDITIONAL  PRINT  PARAMETERS,  FOLLOWED  BY  THE  REMAINING  PARAM¬ 
ETERS, 
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9.2  FLO*  CF  CGNTROL 

THE  GROSS  LOGIC  OF  THE  PHASE  II  SYSTEM  1$  ILLUSTRATED  IN  FIGURE 
9.2.1.  EACH  DO X  IS  LABELLED  WITH  THE  NAME  OF  ThE  MODULE  (SEE 
SECTION  9.3)  WHICH  HANDLES  THF  PROCESSING  DFSCRIBFO.  THERF  ARF 
THREF  MAIN  PHASES  TO  AN  EXECUTION  CF  THE  MODEL: 


PHASE  l  -  RE  AC  IN  TABLE  INPUT  PARAMETERS.  DOWN  TO  AN 
"♦EXECUTE"  CCNTROL  CARD 

PHASE  II  -  INTERPRET  THE  TABLES.  THAT  IS,  RESOLVE  ALL  INL¬ 
AND  INTRA-  TABLE  REFERENCES,  COMPUTE  INTERNAL 
PARAMETERS,  AND  PROVIDE  DEFAULT  SPECIFICATIONS  AS 
NEEDED. 


PHASE  III-  EXECUTE  THE  PROCEDURE 


TABLE  INPUT  A***********.***..,*...  INTER 
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♦*  NODULE  USED  DEPENCS  ON  TABLE  BEING  PROCESSED 

CROSS  LOGIC  OF  THE  i  NASE  II  SVSTEH  FIGURE  9. 


9.3  MODULES  AND  ENTRY  POINTS 


THE  PHASE  II  SYSTEM  CONSISTS  OF  APPROX  I MATEL Y  25  SEPARATELY 
COMPILED  (OP  ASSEMBLED)  MODULES.  EACH  MODULE  CONTAINS  ONE  OR 
MORE  ENTRY  POINTS  (WHICH  CORRESPOND  TO  SUBROUTINE  CALLS).  THE 
FOLLOWING  DESCRIBES  THE  FUNCTION  OF  FACH  MODULE.  THE  T ABLE 5  IT 
REFERENCES  (SEE  SECTION  10).  ANC  ITS  ENTRY  POINTS. 


OF  THE  APPROXIMATELY  120  ENTRY  POINTS  IN  THE  SYSTEM,  AROUND 
60  PERFORM  A  STANDARD  OPERATION  ON  A  TABLE,  EOFs  EXAMPLE,  TABLE 
INPUT,  TABLE  DISPLAY,  ETC.  MOST  CF  THE  TABLES  HAVE  FIVE  ASSOCIATED 
ROUTINES  OF  THIS  TYPE,  AS  FOLLOWS  (“XX"  STANDS  FOR  A  TABLE  IDENT¬ 
IFIER): 


RXX (NO, I P )  -  READ  TABLE  "XX“  FROM  LOGICAL  DEVICE  “NO". 

PRINT  THE  TABLE  (IN  INPUT  FORMAT)  ON  THF 
STANDARD  OUTPUT  DEVICE  IF  IP-«=0. 

IXX  -  INTERPRET  TABLE  "XX".  I  NT  ERPR6T  AT  ION  IS 

PERFORMED  FOR  EACH  TABLE  KNOWN  TO  THE  SYSTEM 
AFTER  ALL  TABLES  HAVE  BEFN  INPUT,  AND  AN 
"♦EXECUTE"  CONTROL  CARD  HAS  BEFN  ENCCLNTERFD. 


PXX<  NO) 


PRINT  TABLE  "XX"  ON  LOGICAL  FILE  "NO" 


DXX(NO) 


DUMP  (PRINT  ALL  PARAMETERS,  EXTERNAL  AND 
INTERNAL)  ON  F ILF  "NO" 


F INDXX (NAME ) 


THIS  INTEGER  FUNCTION  FINDS 
TABLE  "XX"  WHOSE  NAME  IS  THE 
AND  RETURNS  AN  INDEX  POINTER 
ENTITY.  IT  RFTURNS  A  ZERO  IF 
IN  THE  TABLE. 


THE  ENTITY  IN 
VALUE  CF  "NAMF", 
TO  THE  REQUESTED 
THE  ENT  ITY  I S  NOT 


9.3.1  MOCULES  AND  ENTRY  POINTS 

FOLLOWING  IS  AN  ALPHABETICAL  LIST  OF  MODULES  (TOGETHER  WITH 
THEIR  ENTRY  POINTS)  CURRENTLY  IN  THE  SYSTEM: 


AC  I/C  SUPERVISOR 

TABLES:  BU.  CH,  CU,  CP,  DV 

ENTRY  POINTS: 

AC(OEV,CYL»TRKP»TMT* 8UFP, BUF , TYP )  -  INITIATE  I/O  REQUEST 
MAI T ( BUFF, BUF  )  -  WAIT  FOR  A  SPECIFIC  REQUEST  TO  COMPLETE 
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PRC1CIT)  -  INITIATE  CPU  PROCESSING 
RESET  -  RESET  SYSTEM  TC  TIRE  ZERO 
START  -  INITIALIZE  SYSTEM 

PICINO)  -  PRINT  STATUS  OF  QUEUES  AND  HARDWARE 
DBUINO)  -  DUMP  BUFFER  TA0LES 
DCl(NC)  -  OUMP  QUEUE  TABLES 


ALLCC  FILE  ROUTINES 

TABLES:  DP,  CV,  FL 

ENTRY  POINTS: 

ALLOC  -  ALLOLATF  SPACE  FOR  ALL  FILFS 

LCCATE (FILE, RFC, DEV, CYL^TRKP)  -  LOCATE  A  RECORD  OF  A  FILE 
IFL(I)  -  INTERPRET  A  FILE 

CKEATEI^AHE.TyPE.OEVT.IRPB.^PC.iLLT.'ALl.aTyP.NauF.HV.CH.EXX. 


AUXPR  "AUXILIARY"  PROCEDURE  OPS.  EXTENSION  TO  "EXPR". 

TABLES:  OS,  FL,  LI,  PR 

ENTRY  POINTS: 

AUXPR I ( NOP ,!,*,*)  -  AUXILIARY  PROCEDURE  CP  INTERPRET 
AUXPREINCPfSN, *,*,*>  -  AUXILIARY  PROCEDURE  OP  EXECUTE 

BD  BLOCK  DATA  FOR  PRE-DEFINEO  SYSTEM  ELEMENTS 
TABLES:  CH,  CU,  Cl,  CS,  DP,  DV ,  FL ,  LI,  (j,  SG,  TB 
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:.fH‘cT  ACC  f  s  S  ?■  f  • 

(ABLESi  BU,  Ft 
fMmv  Bf.n NTS: 

RF ACD( F II ,REC  I  -  READ  A  RECORD  FROM  A  FILE 
WR I F D ( F IL, RFC )  -  WRITE  A  RECORD  ON  A  FILE 
WAITCIFIL)  -  WAIT  FOK  I/O  COMPLETION 


I  DISTRIBUTION  ROUTINFS 

TABLFS:  DI 

ENTRY  POINTS: 

RDI ( NO, I P )  -  READ  DISTRIBUTIONS 
IDI  -  INTERPRET  '■ 

PO  I  (  NO  )  -  PRINT  ii 

DDUNGI  -  DUMP  '■ 

CREATOI TYPE .MODE ,IPT )  -  CREATE  A  DISTRIBUTION 

PUTD< IDIS.ARG.VAE)  -  PUT  AN  ENTRY  IN  A  DISTRIBUTION 

F  INODI ( NAME  I  -  FIND  A  DISTRIBUTION 

0 1  ST ( NC  »  RLO , RHI )  -  RETURN  A  RANDOM  VALUE  FROM  A  DISTRIBUTION 
IDIST( NC.LO.Hl >  -  SAME  FOR  INTEGER  DISTRIBUTIONS 

OISTVINO.ARG)  -  RETURN  PROBABILITY  OF  "ARG"  (DISCRETE  DIST) 

0 1 STC ( NO, ARG )  -  RETURN  CUMULATIVE  PROBABILITY  OF  "ARC" 

D  I  STA ( NO , VAL )  -  INVERSE  OF  DISTC 

IDI STAINO.VAL)  SAME  FOR  INTEGER  DISTRIBUTIONS 


OS  DATASET  ROUTINES 

TABLES:  FL,  DS 
ENTRY  POINTS: 
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*•<’.  i  p  i  -  daia^sts 

IDS  -  INTERPRET  f*  a  T  A  S  F  T  5 

PDS(NO)  -  PRINT  CMASETS 

:)(IS(  \C  I  -  DUMP  DATASETS 

P  I  MTIIS  I  NAME  )  -  F  [NO  C  A  I  A  S  E  T 

10S?  -  POST-ALLCCAT IDN  INTERPRET  DATASETS 

E.TKCK  ERROR  HANCLER 
FNTRY  POINTS: 

E  RRGR ( N l , N? }  -  SIGNAL  THE  SYSTEM  THAT  AN  ERROR  HAS  OCCURRED 

EXPR  PRCCtDLRE  INTERPRET  ANC  EXECUTE 

TABLES:  DL.  F  L  ,  LI,  PR,  Qu,  SG 

ENTRY  PCINTS: 

IPR  -  INTERPRET  PROCEDURE  TABLES 
E  X  PR  -  EXECUTE  PROCEEURE 

FXPPF  -  RE-ENTRY  POINT  TO  »EXPR«  FOR  HANDLING  ERRORS 

HO  HARDWARE  ROUTINES 

TABLES:  CH,  Cb,  OP,  CV 

ENTRY  POINTS: 

RHO( NO, IP)  -  READ  HARDWARE  TABLES 
PHO(NC)  -  PRINT  HARDWARE  TABLES 
SUV  -  INTERPRET  CEVICES 
OQVINC)  -  DUMP  •' 

ICP  -  INTERPRET  DEVICE  PROTOTYPES 
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nnpf  NDi 

'CU  -  I 
DCU(  NO 

ICH  -  I 

dchi  no 

F INDDVI 
F I NOCU I 
FINDCHI 
F  INDDPf 
PDP(NP) 
RDPI NCI, 


-  ntimo  *  » 

NT CRPKET  CONTROL  UNITS 

-  DUMP  "  *• 

NTEKPRET  CHANNELS 


-  CUMP 

M 

NAME)  - 

f  INC 

A  DEVICE 

NAMF)  - 

tf 

CONTROL  UNIT 

NAMF  )  - 

n 

CHANNEL 

NAME)  - 

« 

DEVICE  PROTOTYPE 

-  PRINT 

DEV 

ICE  PROTOTYPES 

I P »  -  READ  DEVICE  PROTOTYPES 


INTER  TABLE  I /C  COMMON  ROUTINES 
ENTRY  POINTS: 

RDCDINC, IP, KEY)  -  READ  AN  INPUT  CARD,  RETURN  KEYWORD 

INTERPI MASTER, CONTENfNl, Ml ,N-  M, IP)  -  INTERPRET  AND  STORE  ONE 
CARD  OF  DATA  ACCORDING  TO  RASTER  AND  CONTENTS  TABLES 
SUPPLIED 

HEAOIMASTER,CONTEN,FCRMT,BEG,ENO,Ml,N,M,NO,NI,N2I  -  CONSTRUCT 

A  HFADING  FOR  A  TABLE  TO  BE  OUTPUT  ANO  A  FORMAT  STATEMENT 
FOR  THE  CONTENTS  OUTPUT 

STOREtNAMEX, TYPEX,MAPX, MASTER  , Ml )  -  STORE  NAME,  TYPE,  ANO  MASTER 
VECTORS  IN  MASTER 

CCNVERt INPUX»OUTPUX,$PEC)  -  CONVERT  FROM  EBCDIC  TO  INTEGER  OR 
REAL 


ISAM  INDEX-SEQUENTIAL  DATASET  NON-EXECUTION  TIME  ROUTINES 

TABLES:  DP»  OS,  FL,  IS 
ENTRY  POINTS: 
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RXIS(.NOfIPf*iENT»  -  READ  t  SAM-ftfL ATED  DATASET  PARAMETERS 
Px I S ( NO , NENT , IL  ,  1 2  )  -PRIM  ISAM-RELATED  PARAMETERS 
OXISINO.NENT)  -  DUMP  ISAM  TABLE 
I  >• !  S  5  NO  -  INTERPRET  ISAM  DATASET 


i- I  LIST  ROUTINES 

TABLES:  LI,  QU 

ENTRY  POINTS: 

RLIOVC, IP)  -  READ  LISTS 

PLKNC)  -  PRIM  LISTS 

DL  r (NO)  -  DUMP  LISTS 

ILI  -  INTERPRET  LISTS 

FlNDLIINAMFJ  -  FIND  A  LIST 

GET(LNO)  -  GET  NEXT  ELEMENT  FROM  A  LIST 

IGET(LNC)  -  SAME  FOR  INTEGER-V ALLIED  LISTS 

CREATUTYPE, MQ, SIZE, LO,HS,DIS, LPT  >  -  CREATE  A  LIST 

PUT (l 1ST , ENTRY J  -  PUT  AN  ELEMENT  IN  A  CREATED  LIST 

REINITILNO)  -  REINITIALIZE  A  LIST 

PEM8(LFrx,NClf NC2.ND)  -  PRINT  "PROCEDURE-EMBEDDED"  LIST 
RLLI  -  RELEASE  CREATED  LISTS,  PACK  LIST  TABLES 


MAIN  MAIN  PROGRAM  -  OVERALL  CONTROL  OE  SYSTEM 


OPEN  FILE  OPEN  AND  CLOSE 

TABLES:  BU,  FL 
ENTRY  POINTS: 


OP E  M  F  I  L  ,  S  r  A  T  ,  N  BoP  e  T  y  P  ,  C  n  ,  „  v  i 


LPE\  *  Fj a 


CLCSCIPILI  -  CLOSE  4  MI.f 

PCP  USED  Ur  ERROR  ROurlNfc  (ASSi.MBLY  LANGUAGE) 

ENTRY  POINTS: 

POP  (  NAME  , SA  VE  >  *  RETURN  NAME  AND  SAVF  AREA  OP  CALLING  ROUTINE. 

REMOVE  CALLING  ROUTINE  FROM  "CALL"  CHAIN. 

LINK  -  LINK  TC  ENTRY  "MAIN" 

Q  S  AM  sequfntial  access  routines 

TABLES:  PL,  BU 

ENTRY  POINTS: 

READSIFILI  -  READ  NEXT  RECORD 
WRITS(FIL)  -  hRITE  NEXT  RECORO 
UPDATS(FIL)  -  UPDATE  LAST  RECORD  READ 

QU  QUALIFICATICN  ROUTINES 

TABUES:  DI,  QU,  SG 

ENTRY  POINTS: 

ROUINC.1P)  -  REAC  QUALIFICATIONS 
PCUIINC)  -  PRINT  " 

ICU  -  INTERPRET  " 

OCUINC)  -  DUMP  " 

FINDCU  -  FIND  A  QUALIFICATION 
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A*.  '■  - rt  (  ASS*  *MLY  LANGUAGE) 

sir  !  I  «  i  S  i  •'TCC.QMAN,  AS!-.  ¥  I  L  L  f  M  ,  A  PSE UOO-R 4ND0H  NUM0F  ft 
GENERATOR  EG*  Ih£  !  BM  .*60,  IB*  RESEARCH  REPORT  RC  2330. 

JANUARY  *5,  i  GfeG 

ENTRY  "HINTS: 

RANC(X)  -  RETURN  A  PEAL  RANCCM  NUMBER  CN 


KANO  RANCOR  NUMBER  GENERATOR 
ENTRY  POINTS: 

R  AND  X ( X i Y )  -  RETURN  A  REAL  RANDOM  NUMBER  ON  1X,Y) 

IRANCXI  IX, IV  )  -  RETURN  AN  INTEGFR  RANDOM  NUMBER  ON  (IX.IY) 


READR  INOFX-SEUUENTI AL  DATASET  EXECUTION-TIME  ROUTINES 

tables:  BU,  OS,  EL,  IS 
ENTRY  POINTS: 

READR ( NCF ,NG )  -  READ  A  RECORD 
wRITRtNCF.NC;  R  R  I T  £  A  RECORD 
UPDATR (NOF I  -  UPDATE  LAST  RECORD  READ 


RPR  PROCEDURE  TABLE  ROUTINES 

I  ALL  £  S  :  PR 
ENTRY  POINTS: 

RPR  I  NO , I P I  -  RE  AO  PROCEDURE 
OPRINCI  -  DUMP  procedure 
P  PR ( NO  I  -  PRINT  PROCFCURE 
PTI  -  PRINT  TIMERS 
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SIGMFNT  1  A  ;!  L  F  u  E  U  T  iNf  S 


ABLE 

S: 

OS 

,  SG 

N  TRY 

P 

DINT 

3  l 

RSGI 

NO 

.IP) 

-  RE 

At  SEGMENTS 

ISG 

- 

INTfc 

RPR  E  T 

•  f 

PSGl 

NC 

)  - 

PR  I  NT 

M 

CSG  ( 

NC 

)  - 

DUMP 

U 

FIND 

SG 

(NAME)  - 

FIND 

A  SEGMENT 

I  ED 

- 

INTE 

RPRf  T 

F  l  El 

*C$ 

DEO  I 

NO 

)  - 

DUMP 

H 

F ! NDFD ( NAME  )  -  F  INC  A  F  IFLD 


T  3  TABLE  ROUTINES 


TABLES: 

TB 

entry  PC 

INT 

r  * 

J  • 

rtbinc. 

IP) 

-  READ 

T  ABL 

ES 

H i B ( NO  ) 

- 

PRINT 

M 

OTB(NO) 

- 

DUMP 

II 

F  I NC T B ( 

NAM 

e  j  -Fi 

NC  l 

TABLE 

T  ABL  E  (  T 

Aft, 

ARC)  - 

table 

-LOCK 

-UP  FtINC 

TION 

ITABLEI 

TAB 

,  ARC )  - 

SAME 

FOR 

INTEGER 

TABLES 

TBO  TRACE  BLOCK  DATA  FOR  USE  IN  CONJUNCTION  WITH  «*TRACE" 

CONTROL  CARO.  THERE  A«F  THREE  OBJECT  MODULFS  SUPPLIED  WITH 
THE  SYSTEM  FOP  THIS  PURPOSE:  "TBDOOO"  WHICH  SUPPLIES  NO 
TRACING  (EXCEPT  UNDER  CONTPOL  OF  THE  "TRACE"  PROCEDURE 
OP),  "T80001"  WHICH  TRACES  ALL  SUBROUTINE  CALLS,  AND 
" TBR0C2"  WHICH  TRACES  ALL  ROUTINES  FxCEPT  TABLE  ROUTINES, 
WHICH  PRODUCE  VOLUMINOUS  AND  CONFUSING  TRACE  INFORMATION. 
"TBDOOO"  IS  OBTAINED  eY  CFFAULT ,  AND  THE  OTHERS  MAY  BE 
INVOKEO  BY  INCLUSION  AT  LTNK-EDIT  TIME,  ANO  APPROPRIATE  USE 
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c';*-. trci  (a a r.  <sEt  section  ?> 


HHF  TASK  TIMING  ROUTINES  IASSEH8LV  L  ANCUAGE  • 
ENTRY  POINTS: 

TIME  -  SET  INTERVAL  TIMER 

ITIME(X)  -  RETURN  INTERVAL  TIMER  VALUE 
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FCLLOWINt  IS  AN  ALPHA RET  1  C  A  L  LIST  LF  ALL  ENTRY  POINTS.  EACH 
ENTRY  °CINT  REFERS  TC  the  vcCuLF.  CONTAINING  if. 


ENTRY  PCIN'T 

VCCLU 

A  C 

AC 

ALL  CC 

ALLCC 

ACXPRF 

AlxPR 

A  Li  X  PR  ( 

A  C  X  P  .1 

CLOSE 

C  PE  N 

Cl.N'Uf  R 

INTER 

CRfc ATD 

ni 

CREATE 

ALLCC 

CREATE 

LI 

CBU 

AC 

DCH 

HO 

DCL 

HU 

OCI 

C  I 

OCP 

HD 

DCS 

ns 

DDV 

HD 

CFO 

SG 

0  I  ST 

D I 

DISTA 

D! 

DISTC 

D I 

DISTV 

D  I 

DU 

L  I 

DPR 

RPR 

DC 

AC 

DCU 

cu 

DSG 

SG 

DTB 

TR 

DXIS 

ISAV 

EPRCR 

ERROR 

E  XPR 

F  X  PR 

F  VPR  E 

F  XPR 

F  INOCH 

HD 

F  INDCL 

HC 

FINCHI 

D I 

FINODP 

HO 

FINODS 

DS 

FINOOV 

HO 

FINOFO 

SC 

F INDLI 

LI 

F INCCU 

CL 

F  INDSG 

SG 

F INDTB 

TR 

GET 

LI 

HEAD 

INTER 

ICH 

HD 

ICU 

HD 

ICI 

DI 
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I  i'  f  E  T 
:  ■  !  '■  T  4 
I  UP 
ICS 
I  CS2 
ICV 
in; 

f  r  i 

IGF  T 
!  L  I 

I  tii  T  f  K  P 
f  PS 
I  LU 

I R  AND  x 
I  SC 

I  TABLE 

ITJwfc 

I  XI  S 

L  INK 

LOCATE 

MAIN 

OPEN 

PCI 

POP 

POS 

P  E  M  B 

PhD 

PIO 

PLI 

POP 

opp 

PQU 

PROC 

PSG 

PT0 

PTI 

PUT 

PUTO 

PXI  s 

RANG 

R  ANLX 

R  CCD 

RCI 

R  CP 

ROS 

RE  ADO 

PEACR 

READS 

RFINlT 

RESET 

RHJ 

Ul 

PLLl 

RPR 

RCU 


ns 

~  i 
Ml; 

ns 

L>S 

HD 

SO 

ALL  OC 

L  I 
L  I 

INTER 
E  XPP 
UL 

o  AND 

SC 

TB 

T  IRE 

I  SAP 

POP 

ALLOC 

RAIN 

OPEN 

Dl 

HD 

OS 

L  I 

HC 

AC 

L  I 

POP 

RPR 

cu 

AC 

SC 

TB 

RPR 

LI 

ni 

l  SAP 

RAN 

RANG 

INTER 

Cl 

HD 

OS 

BDAH 

READR 

QSAH 

LI 

AC 

HD 

LI 

LI 

RPR 

OU 
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RSG 

RTB 

RX[  S 

START 

STORE 

TABLE 

TIME 

UPOATR 

UPOATS 

WAIT 

wAI  TO 

WRITT) 

WRITS 

WRITS 


SG 

TB 

I  S  Af* 
AC 

INTER 

TB 

TIME 

RF  ACP 

QSAR 

AC 

BOA" 

ROAM 

REAOR 

GSAR 


10.0 


INTERNAL  PARAMETER  DEFINITIONS 


IN  ADDITION  TO  ITS  INPUT  OR  “EXTERNAL"  ATTRIBUTES,  AN  ENTITY  IN 
THE  SYSTEM  MAY  4L SO  HAVE  ATTRIBUTES  (Oft  PARAMETERS)  WHICH  ARE 
USED  ONLY  IN  THE  INTERNAL  OPERATIONS  OF  THE  SYSTEM.  IN  FACT,  SOME 
TAriLES,  SUCH  AS  THE  BUFFER  <3U)  TABLES,  CONSIST  SCLFLY  OF  INTERNAL 
PARAMETERS.  DEFINITIONS  OF  ALL  INPUT  PARAMETERS  '  <  A  V  E  BEEN  GIVEN  IN 
SECTION  3,  ANO  IT  IS  THE  PURPOSE  OF  THIS  SFCTION  .*0  DEFINE  THE 
INTERNAL  PARAMETERS. 


FOR  EACH  TABLE,  The  IDENTIFIER  CF  THE  TABLE  IS  GIVEN  (FOR 
EXAMPLE,  “Ft"  FOR  FILE  TABLE),  ANO  THEN  THREE  ITEMS  OF  INFORMATION 
ARE  GIVEN  FOR  EACH  INTERNAL  ATTRIBUTE  DESCRIBED  9Y  THE  TABLE: 

ID  ATTRIBUTE  IDENTIFIER.  FOR  EXAMPLE  “RSiZ"  FOR  "RECORD  SIZE". 

<21  AT  TR I  BUT  F  TYPE.  R  =  REAL,  I  =  INTEGER,  A  *  ALPHAMERIC 

(3)  ATTRIBUTt  DEFINITION 

AS  DESCRIBED  IN  SECTION  9.1,  A  PROGRAM  REFERENCE  TO  THE  RECORD 
SUE  OF  FILE  "F I  L"  WC^LD  BE  GIVEN  BY  "FLRSIZ IFIL 

THOSE  ATTRIBUTES  CONTAINING  RFPFATINC  INFORMATION  ISEE  SFCTION 
9.1)  ARE-  MARKED  WITH  AN  ASTERISK  (*). 


SEME  OF  THE  INTERNAL  ATTRIBUTES  ARE  "INDEX  POINTERS"  ISEE  SECTION 
9.1)  TO  ENTITIES  IN  TABLES,  ANO  IN  SOME  CASES  CORRESPOND  TO  AN 
INPUT  PARAMETER  WHICH  IS  THE  NAMF  OF  THE  ENTITY.  IN  SUCH  A  CASE, 
THE  ATTRIBUTE  WILL  BE  DEFINED  AS  "POINTER  (XXXX)",  WHERF  "XXXX" 


IS  THE 

NAME  CE  THE 

ATTRIBUTE. 

10.1 

DEVICE 

PROTOTYPE  TABLE  (DP) 

PTR 

I 

POINTER 

(TB) 

PTA 

I 

If 

( TABA) 

PTS 

I 

« 

ITABS) 

10.2 

CHANNEL 

TABLE  ICH) 

TCC. 

1 

..TIME— OF- 

ccmletion  chain 

WORD 

TC 

K 

T I  ME  CF 

COMPLETION 

Cl 

l 

CHANNEL 

QUEUE  CHAIN  WORD 

;  POINTER 

TO  MOST 

Rf CUE  ST 

FOR  THIS  CHANNEL 

IN 

TABLE 

<30 

I 

POINTER 

TO  NEXT  REQUEST 

FOR 

THIS 

CHANNEL 

BUSY 

I 

BUSY  FLAG,  -0  IF  CHANNEL 

I  S 

BUSY 
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10  3  CONTROL  UNIT  TABLE  ( CU ' 

CH  1*  POINTERS  TO  CHANNELS  IT  MAY  BE  ATTACHED  TO 

DV  I*  POINTERS  TO  DEVICES  ATTACHED  TO  IT 

CCH  i,  NAME  OF  CURRENT  CHANNEL  -  MAINTAINED  BY  "PIC"  ROUTINF 

BUSY  I  BUSY  FLAG,  '■O  IE  CONTROL  UNIT  IS  BUSY 

USE  I  CONTROL  UNIT  USE  COUNTER  -  NUMBER  HE  SEEKS  INITIATED 
BY  THIS  CONTROL  UNIT  FOR  WHICH  TRANSMISSION  HAS  NOT 
TAKEN  PLACE 


1C. 4  OEVICE 
CU 
PTR 
CYL 
AVAL 
BUSY 
TCC 
TC 
Cl 

CO 


TABLE 

I  POINTER  TO  CONTROL  UNIT  DEVICE  IS  ATTACHED  TO 
I  POINTER  (TYPE) 

I  LURRENT  CYLINDER  UNOER  HEAD 
1  NUMBER  CF  CYLINDERS  AVAILABLE  CN  DEVICE 
I  BUSY  FLAG,  -0  IE  OEVICE  IS  BUSY 
I  TIME-OF-COMPLETICN  CHAIN  WORD 
R  TIME  OF  COMPLETION 

I  DEVICE  QUEUE  CHAIN  WCRO:  POINTER  TC  MCST  RECFNT 
REQUEST  FOR  THIS  DEVICE  IN  TABLE 

I  POINTER  TC  NEXT  REQUEST  ECR  THIS  DEVICE 


10. S  DATASET  TABLE  (OS) 

PTR  I*  POINTERS  TO  FILE  TABLE  ENTRIES  FOR  FILES  WHICH  BELONG 
TO  THIS  DATASET 

AMPT  I  POINTER  TO  ACCESS-METHCC- RELATED  PARAMETERS  ENTRY  IN 
ACCESS  METHOD  PARAMETER  TABLE 


10.6  FILE 
TNR 
RS I Z 
XPTR 
NEX 


TABLE  (FL) 

I  TOTAL  NUMBER  CF  RECORDS  IN  FILE 
1  RECORD  SIZE 

1  POINTER  TO  FIRST  EXTENT  CF  THIS  FILE  IN  EXTENT  TABLES 
I  NUMBER  CF  EXTENTS  IN  THIS  FILE 
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HPT  I  NUMBER  OF  BLOCKS  PER  TRACK  OP  THIS  FILE 


TPfl  R  1/PPT 


TMT  R  TRANSMIT  TIME  PER  BLOCK 


EXTP  I  POINTER  (EXT) 


0  V  TP  I  POINTER  (CEVT) 


6UF  I  POINTER  TC  BUFFER  TABLE 


1C. 7  EXTENT  TABLE  <FX) 


PTRO  I  POINTER  IDEv) 

LREC  I  LAST  RECORD  OF  PILE  ON  THIS  EXTENT 


1C.B  BUFFER  TABLES  ( BU ) 

MOST  OF  THE  PARAMETERS  IN  THE  BUFFER  TABLES  HAVE  SIGNIFICANCE 
ONLY  TC  THE  ACCESS  METECD;  THAT  IS,  THEY  CONTAIN  STATUS  OF  AN 
OPEN  FILE  ECP  USE  BY  THE  ACCFSS  METHOD.  TWO  PARAMETERS  WITH 
SYSTEM  SIGNIFICANCE',  HOWEVER,  ARE: 

(1)  CUB  -  MUST  BE  NON— ZERO  WHILE  FILE  IS  OPEN  AS  AN  INDICATION 

THAT  THE  BUFFER  ENTRY  IS  IN  USE 

(2)  BUF  -  THIS  IS  A  REPEATING  ATTRIBUlt,  EACH  INSTANCE  OF 

WHICH  CORRESPONDS  TO  A  BUFFER  AVAILABLE  FOR  USE  BY 
THE  FILE  USING  THIS  BUFFER  TABLE  ENTRY.  TWO 
SUBSCRIPTS  ARF  REQUIRED  TO  ACCESS  THE  ATTRIBUTE 
INFORMATION  FOR  A  BUFFER;  FOR  EXAMPLE, 

"BUBUFIBUF.I >«  WOULD  CONTAIN  INFORMATION  RELATING  TO 
THE  "J-TH"  BUFFER  CF  BUFFER  TABLE  ENTRY  '‘BUF". 

WHEN  A  REQUEST  FOR  l/C  IS  PRESENTED  TO  THf  SYSTEM 
BY  AN  ACCESS  METHOD  (BY  THE  "AC"  ROUTINE,  SECTION 
R.T),  A  ( BUF , I )  PAIR  IS  ALSO  SPECIFIED.  UPCN  RFCEIPT 
OF  THE  RFQUEST  BY  THE  SYSTEM,  A  NON-ZERO  VALUE  IS 
STORED  IN  BUFFER  "HUHUF I BUF , I ) "  BY  THE  SYSTEM,  AND 
WHEN  THE  REQUEST  IS  SATISFIED,  "BUBUF ( BUf , 1) "  IS 
ZEROED  CUT  AGAIN. 

IN  THE  FOLLOWING,  THE  PARAMETERS  ARE  DEFINED  AS  THE  WERE  FOR  THE 
SEQUENTIAL  ACCFSS  METHOD. 

LSTR  l  LAST  RECORD  RECEIVED  FROM  (WRITE!  OR  SENT  TO  (READ) 
PROGRAM 


CUB  I  CURRENT  BUFFER  (CF  BUFFER  ENTRIES!  INTERFACING  RECORDS 
TC  PROGRAM 
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NXH 


I  NEXT  RECORD  (OF  BLOCK  OR  RECORDS  RFPRESENTEC  BY  "CUB") 

to  be  interfaced  kith  programs 


BUF 

I* 

BUFFER  ENTRIES 

UP 

I 

UPDATE  FLAG.  -1  IF  LAST  RECORD 

READ  IS  TO  BE  UPDATED. 

STAT 

A 

FILE  STATUS.  R=  READ  SEQUENTIAL 

.  W=WR!TE  SEQUENTIAL. 

LBS 

I 

NOT  USEC 

LRS 

l 

•1 

LH  A 

I 

n 

EOL 

l 

H 

LIST 

TABLE 

(LI) 

T  YPN 

I 

LIST  TYPE  CODE 

1  =  LITERAL  ILL) 

2  =  SEQUENTIAL  (SL) 

3  =  RANDOM  (RL,RQ) 

A  =  ’  RANOCM  SEQUENTIAL  (RS.SQ) 

MOON 

I 

LIST  MODE  CODE 

1  =  R 

2  *  I 

3  =  A 

OP  TR 

I 

POINTER  (CIST) 

PTR  I  POINTER  TO  FIRST  ELEMENT  OF  THIS  LIST  IN  LITERAL  LIST 
ENTRY  TABLE  ( LETA8  > 

THE  FOLLOWING  ARE  DYNAMIC  PARAMETERS  WHICH  RECORD  THE  CURRENT 
STATUS  OF  A  LIST.  AT  PROCEDURE  INITIATION,  THF  DYNAMO  PARAMETERS 
WILL  CCNTAIN  THE  SAME  VALUES  AS  THEIR  CCRRESPCNC ING  "STATIC",  OR 
INPUT  PARAMETERS,  BUT  wILL  CHANGE  OURING  EXECUTION  AS  ELEMENTS 
ARE  REMOVED  FROM  THE  LISTS. 

SI  2D  I  SIZE  OF  LIST 

ILO  I  LOW  VALUE  (I-LISTI 

IMS  I  HIGH  OR  SKIP  VALUE  II-LIST) 

RIO  R  LOW  VALUE  IR-LIST) 

RHS  R  HIGH  OR  SKIP  VALUE  IR-LISTI 

LPTR  I  POINTER  TL  NEXT  ELEMENT  CF  A  LITERAL  LIST 
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10. 1C  PROCEDURE  TABLE  (PR) 


OPN 

I  OPE  RAT  I CN  COCE  NUMBER 

OPTR 

I  POINTER  (OBJ) 

L  PTR 

I  POINTER  (LIST) 

SPTR 

I  POINTER  (SGC) 

FPTR 

I  POINTER  (FGC) 

SYNC 

I  "SYNC"  CHAIN  WORD  -  TIES  TOGETHER  PROCEDURE 
WHICH  HAVE  BEEN  SYNC  *  ED 

STATEMENTS 

WPTH 

l  NOT  USED 

NO 

I  STATEMENT  NUMBER  (FOR  PROCEDURE  PRINT  AND  DIAGNOSTICS) 

ER 

I  ERROR  FLAG.  =1  IF  PROCEDURE 
AN  ERROR  IN  THIS  STATEMENT. 

INTERPRETATION 

HAS  FOUND 

LREC 

I  LAST  RFCORO  ACCESSED  BY  THIS 
OP) 

STATEMENT  (IF 

AN  ACCESS 

INCLUDED  WITH  THE  PROCEDURE  TABLE  IS  THE  TIMER  TABLE 
THE  FOLLChING  PARAMETERS: 

(TI),  WITH 

NAME 

A  TIMER  NAME 

TIMS 

I  POINT  IN  SIMULATED  TIME  WHEN 

THIS  TIMER  WAS 

SFT 

T I  MR 

I  VALUE  OF  REAL  TIME  CLOCK  WHEN  THIS  TIMER  WAS  SET.  THE 
REAL  TIME  CLOCK  COUNTS  DOWN  IN  26  MICROSECOND  UNITS. 

10.11  DISTRIBUTION  TABLE  (Cl) 

SIZE  I  NUMBER  CF  ENTRIES  IN  DISTRIBUTION 

PTR  I  POINTER  TO  ENTRIES  IN  DISTRIBUTION  CONTENTS  TABLE: 
OEARG  =  TABLE  OF  ARGUMENTS  (RANDOM  VARIABLE) 

DEVAL  =  TABLE  OF  VALUES  (CUMULATIVE  OIST.  VALUES) 

CLAS  I  TYPE/MOCE  CLASS 

1  *  INTEGER/INTERPOLATE  COR  INTEGER  CONTINUOUS) 

2  *  INTEGER/NC  INTERPOLATE 

3  =  ALPHAMERIC 

4  =  REAL/0 I SCRETE 

5  =  REAL/CCNTINUOUS 


10.12  QUALIFICATION  TABLE  I CU ) 

IN  THE  FOLLOWING.  “INTERVAL"  REFERS  TO  THE  PHENOMENON  INTROOUCFO 
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BY  "SORT"  QUALIFICATIONS  (SEE  SECTION  3.5),  NAMELY,  THAT  QUALIF¬ 
ICATIONS  INVOLVING  SORT  FIELDS  QUALIFY  RECORDS  OP  SEGMENTS  OVER 
SOME  SUBSET  OF  THE  DATASET  RECCRO  RANGE. 

VAL1  I  FIELD  VALUE  (INTEGER) 

VAIR  K  FIELD  VALUE  (REAL) 

PSCI  R  FRACTION  OF  SEGMENTS  QUALIFYING  OVER  INTFRVAL 

PSC  R  "  «  "  »  OVERALL 

R/RQI  P  FRACTION  CP  0  A  i  A  S  k  t  MASTERS  ul.aLi*-Y1NC  ON  THE  INTFkv«l 

NRU  I  NUMBER  CF  DATASET  MASTERS  (DR  RECORDS)  QUALIFYING 

LRfc  I  LOW  RECORD  QUALIFYING 

HRQ  I  HIGH  RECORD  QUALIFYING 

TYPE  I  QUALIFICATION  TYPE  (SEE  SECTION  3.5) 

1  =  FIELD 

2  =  BOOLEAN 

3  =  SEGMENT 

FTYP  A  FIFLD  TYPE  (FOR  TYPE  l  QUALIFICATION)  I,  R,  A 
SGPT  I  POINTER  TO  SEGMENT  BEING  QUALIFIED 
FQPT  I  POINTER  TO  FIELD 

0I3P  I  POINTER  TC  "Ql"  OR  "Q3"  I  SEE  SECTION  3.5) 

C2PT  I  POINTER  TO  "02" 

NXPT  I  POINTER  TC  NEXT  QUALIFICATION  TO  INTERPRET  IF  THIS  ONF 
IS  MOUIFIEO  (NOT  IMPLEMENTED) 

IFLG  I  INTERPRETATION  FLAG 

0  =  QUALIFICATION  NOT  INTERPRETED 

1  =  "  INTERPPFTED 

2  *  "  IN  ERROR 

SFLG  I  SORT  FLAG  -0  IF  THIS  IS  A  SORT  QUALIFICATION 

IC. 1 3  SEGMENT  table  (SG) 

SUPP  I  POINTER  (SUP) 

GSPT  I  POINTER  (CS) 

TNS  l  TOTAL  NUMER  CF  SEGMENTS  OF  THIS  TYPE 

OSMP  I  POINTER  TO  THE  DATASET  MASTER  SEGMENT  OF  THIS  SFGNENT 
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NPDM  I  NUMBER  C F  THESE  SEGMENT  S  PER  DATASET  MASTER 


10.14  FIELD 
5GPT 
DPTR 
SIPT 


TABLE  (  FD ) 
t  POINTER 
I  POINTEP 
I  POINTER 


TO  SEGMENT 
(CIST  ) 

(  S 1  L  S  ) 


THIS  FIELC 


BELONGS  TO 


10.  IS'  TABLE  TABLES  <T8> 

SIZE  I  NUMBER  OF  ENTRIES  IN  TABLE 

PTR  I  POINTER  TC  FIRST  ENTRY  IN  TABLE  CONTENTS  TABLE  ITBENT) 


10.lt  CUFUE 
CHAIN 
UE  V 
CYL 
TRKP 
TMT 
3U'F 
T  YP 


TABLE  <C»  (HCLCS  RESUESTS  FCF  1/01 
I  CHAIN  WORD  FOR  CHAINING  QUFUE  ELEMENTS  TOGETHER 
I  DEVICE  RECUCSTFC 
I  CYLINDER 

R  TRACK  POSITION  OF  RECORD 

R  TRANSMIT  time  OF  rcccrc 

I  POINTER  TO  BUFFER 

A  I/C  RECliFST  TYPE 
P  *  READ 
V.  =  WRITE 

WV  =  WRITE  VERIFY  ( NCT  IMPLEMENTED! 
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11. c 


SYSTEM  OISTR J8UT ION  AND  MAINTENANCE 


THE  PHASE  II  SYSTEM  AS  DISTRIBUTED  CONSISTS  OF  A  TAPE  AND  AN 
EXPLANATORY  LISTING,  AS  DESCRIBED  IN  THE  FOLLOWING. 


11. 1  DISTRIBUTION  TAPE 

THE  DISTRIBUTION  V APE  IflOOBPI,  UNLAEELLED)  CONTAINS  SIX  FILFS  AS 

FOLLOWS: 

m  DATASET  "SYSTEM",  A  PARTITIONED  DATASET  I"POS")  WHOSE  MEMBERS 
ARE  OBJECT  MODULES  OF  THE  SYSTEM.  THE  NAMES  OF  THESE  MEMBERS 
ARE  OF  THE  FORM  "XXXXXNNN".  WHERE  "XXXXX"  IS  A  MODULE  NAME, 

AS  GIVEN  IN  SECTION  9 . 3 ,  AND  "NNN"  IS  A  SERIAL  NUMBFR, 

USED  f(J  DENOTE  ra*-r'ct<FNT  GENERATIONS  OF  OBJECT  MODULES. 

(2)  DATASET  "SOURCE",  A  POS  WHOSE  membFRS  ARE  SOURCF  MODULES.  THE 
NAMES  OF  THE  SOURCE  MODULES  ARE  GIVEN  IN  SECTION  9.3. 

THE  SOURCE  MODULES  IN  "SOURCE"  ARF  NOT  COHPLETE,  IN  THAT  THEY 
DO  NOT  CONTAIN  THEIR  REQUISITE  TABLE  SPECIFICATIONS.  WHEN 
COMPILING  SOURCF  MODULES,  THE  RECUIRED  TABLES  ISEE  SECTION 
9.31  MUST  B’E  (PRE-I  CONCATENATED  WITH  THE  MODULE. 

13)  DATASET  "LINK".  A  SEQUENTIAL  DAT  ASF  T  I"SDS")  USED  AS  THE 
"SYSLIN"  DATASET  IN  THE  SYSTEM  LINK-EDIT.  IT  CONTAINS  CARDS 
OF  THE  FORM  "INCLUOEIXXXXXNNNI",  WHERE  "XXXXXNNN"  IS  AN  OBJECT 
MODULE  NAME. 

( A )  OATASET  "TABLES",  A  POS  WHOSE  MEMBERS  ARE  FORTRAN-LANGUAGE 
TABLE  SPECIFICATIONS.  THE  NAMES  OF  THE  MFMBERS  ARE  THE  NAMES 
OF  THE  TABLES  AS  GIVEN  IN  SECTION  10. 0. 

(5)  OATASET  "USER".  AN  SCS  CONTAINING  THE  USER  GUIDE. 

16)  DATASET  "UPDATE",  AN  SDS  CONTAINING  UPDATES  TO  THE  SOURCE 
MODULES.  THEY  ARE  UPDATES  TO  THE  SOURCE  MODULES  DISTRIBUTED 
WITH  THE  SYSTEM  AND  ARE  ALREADY  REFLECTED  IN  THE  DISTRIBUTED 
uBJECT  MODULES.  FUTURE  MINOR  UPDATES.  HOWEVER.  CAN  BE  DISTRIB¬ 
UTED  IN  THE  FORM  OF  UPDATE  CARDS  TO  BE  MERGED  WITH  THE  UPDATE 
DECK.  WHICH  THEN  WOULD  BE  RUN  AGAINST  THF  SOURCF  DECKS  AND 
CCMPILED  TO  OBTAIN  A  NEWLY  UPOATEO  OBJECT  MOOUIE. 


11.2  DISTRIBUTION  LISTING 

THE  DISTRIBUTION  LISTING  IS  THE  OUTPUT  OF  A  JOB  WITH  THE  FOLLOWING 
JOB  STEPS: 

ID  INITIALIZE  OISK  BY  SCRATCHING  DATASETS  TO  BE  CREATED  BY  STEP 
(3).  IF  THEY  EXIST 

121  CCPY  THE  PHASE  II  MASTER  SYSTEM  FROM  DISK  TO  DISTRIBUTION  TAPF 
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(3)  CCPY  THE  SYSTEM  DATASETS  FROM  TAPE  BACK  TO  THE  DISK  TO  TEST 
M>1  LINK-EDIT  THE  SYSTEM 

(5)  EXECUTE  A  TEST  EXAMPLE 

(6)  DEMONSTRATE  THE  USE  OF  THE  UPDATE  DATASET 

(7)  DEMONSTRATE  A  COMPILE  OF  A  SOURCE  MODULE 

(8)  PRINT  THE  USER  GUIDE  FROM  TAPE 

1 9 J  SCRATCH  DATASETS  CREATED  BY  131 
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12.0 


M ATHEMAT I CAL  MODELS 


THIS  SECTION  CONTAINS  SOME  DERIVATIONS  AND  FORMULAS  WHOSE 
MOTIVATION  OP  FCRM  MAY  NOT  HE  OBVIOUS  FROM  THE  PROGRAM  LISTINGS. 


12.1  QUALIFICATION 

THERE  ARE  SEVERAL  CALCULATIONS  TC  BE  PERFORMED  IN  ORDER  TO 
DETERMINE  THE  RANGE  AND  VOLUME  OF  DATASET  MASTER  RECORDS  WHICH 
QUALIFY  FOR  EACH  QUALIFICATION  SPEC  IF iCAI ION .  THESE  CALCULATIONS 
DEPEND  ON  QUALIFICATION  TYPE  I  FI  ELD,  BOOLEAN,  OR  SEGMENT},  AND 
WHETHER  OR  NOT  SORT  FIELDS  ARE  INVOLVED.  FURTHERMORE,  FOR  SEGMENT 
QUALIFICATION,  THEY  DEPEND  ON  WHETHER  OR  NOT  "SEO"  IS  SUPERIOR  TO 
THE  SEGMENT  QUALIFIED  BY  "Q3".  IN  THE  FOLLOWING,  WE  WILL  DISCUSS 
THE  FORMULAS  FOR  SCME  OF  THESE  CASES;  HOWEVER,  WE  WILL  LIST  ALL 
CASTS,  ANO  THE  USER  LAN  TURN  TC  THE  PROGRAM  LISTINGS  FOR  FURTHER 
AMPUF  ICATION. 

IN  THE  FOLLOWING,  "SEGMENT”  REFERS  TO  THE  SEGMENT  BEING  QUALIFIED 
BY  THIS  QUALIFICATION,  AND  "MASTER"  REFERS  TO  THE  DATASET  MASTFR 
SEGMENI  OF  "SEGMENT". 

FOR  EACH  QUALIFICATION  WE  WISH  TC  COMPUTE: 

PSCI  -  FRACTION  OF  SEGMENTS  QUALIFYING  ON  THE  INTERVAL  (IF  A 

QUALIFICATION  IS  NOT  A  "SORT  QUALIFICATION",  THE  INTFVAL 
IS  THE  WHOLE  RANGE  OF  SEGMENT  OCCURRENCES) 

PSC  -  FRACTION  OF  SEGMENTS  QUALIFYING  OVERALL 

PMQ1  -  FRACTION  OF  MASTFRS  QUALIFYING  ON  THE  INTERVAL 

NRQ  -  NUMBER  CF  MASTERS  QUALIFYING 

LRQ  -  "LOW"  MASTER  QUALIFYING 

HRQ  -  "HIGH"  MASTER  QUALIFYING 


WE  MAKE  THE  FOLLOWING  DEFINITIONS: 

N  -  NUMBER  OF  SEGMENTS  PER  DATASET  MASTER 

NO  -  TOTAL  NUMBER  OF  DATASET  MASTER  SEGMENTS  (-  NUMBER  OF 
RECORDS  IN  DATASET) 


m  A  GENERAL  CALCULATION 

IF  A  FRACTION  "P"  OF  THE  SEGMENTS  IN  QUESTION  QUALIFY,  THEN 
(UNDER  ORDINARY  CIRCUMSTANCES),  THE  OTHER  PARAMETERS  ARE 
CALCULATED  AS  FOLLOWS: 
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psc«psqi=p 


PMC  I  *  l.  -  PROS •  THAT  THE  MASTER  SEGMENT  DOESN'T  QUALIFY 
=  !.-  (PROB.  THAT  NONE  CF  ITS  INFERIOR  SEGMENTS  IN 
QUESTION  QUALIFY) 

*  l.-i I.  >  PSC ) **N 

LRC  =  1 

HR  C  =  NO 

12)  FIELD  CUAL I F I  CAT  I  ON  INAMF  QUAL  FLD,REL»VAL) 

LET  "P"  BE  THE  FRACTION  OF  FIELDS  QUALIFYING. 

(A)  "FLO”  IS  NOT  A  SORT  FIELD  -  CALCULATE  AS  IN  III 

(B)  "FLO"  IS  A  SORT  FIELD 

LET  DH  *  FRACTION  OF  FIELD  VALUES  “LESS  THAN"  THE  HIGHEST 
QUALIFYING  VALUE  OF  THF  FIELD 

PSCI  =  1 

PSC  =  P 

PMC I  =  1 

NRC  *  PSC*NO 

HRC  =  NOADH 

LRC  =  HR;  -  NRC  +  1 

(3)  BOOLEAN  QUALIFICATION  (NAME  CUAL  Q1.REL.C2) 

LET  PSCI  =  PSJ  FCR  SEGMENT  QUALIFIED  BY  QI 
PSC2  *  "  "  02 

(A)  NEITHER  Cl  1JH  C2  ARE  "SORT"  QUALIFICATIONS 

P  *  PDQA*PSQB  IF  RELl  =  "AND" 

P  =  P$QA+PSCB-P5CA*PSQB  IF  RELl  *  "OR" 

FINISH  AS  IN  II) 

IB)  Cl  OR  02  IS  A  SORT  QUALIFICATION 

(A)  SEGMFNT  QUALIF  ICATICN  I  NAME  CUAL  SEG*HAS .03 ,« EL2, N) 

LET: 
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SEG2  =  THE  SEGMENT  QUALIFIED  BY  <33 
NS2  *  NUMBER  OF-  SEG2  SEGMENTS  PER  SEG 
PS2  =  PROBABILITY  THAT  A  SEG2  QUALIFIES  BY  Q3 
C(NI,N2I  »  NUMBER  OF  COMBINATIONS  OF  N1  THINGS  TAKEN  N?  AT 
A  TIME 


!A)  SEG  IS  SUPERIOR  TO  SFG2 

i A 1 )  Q3  IS  NOT  A  SORT  QUALIFICATION 

THE  PROBABILITY  THAT  EXACTLY  SEG2'S  QUALIFY  FOR  A 

RANDOM  SEG  IS  GIVEN  B Y ' 

P=  C<NS2,M) *{PS2**M)*( I 1-P S2 I *+  <  M-NS? I ) 

FINISH  AS  IN  (1) 

<A2>  C3  IS  A  SORT  QUALIFICATION;  SEG  AND  SEC 2  ARE  ON  DIFFERENT 
DATASETS 

(A3)  03  IS  A  SORT  QUALIFICATION;  SFG  AND  5EG2  ARE  ON  THE  SAME 
DATASET 


(B)  SEG  INFERIOR  TC  SEG2 

(BII  SFG,  SEG2  ON  THE  SAME  DATASET 

182)  SEG,  SEG2  ON  DIFFERENT  DATASETS 


12.2  isam  overflow  chain  length  distribution 

THIS  PROBLEM  CAN  BE  STATED  AS  FOLLOWS: 

LET:  K  =  NUMBER  CF  PRIME  TRACKS  INITIALLY  FULL 

A  =  NUMBER  OF  OVERFLOW  RFCGRDS  TO  RF  ASSIGNED  TO  THE  M 
TRACKS  AT  RANDOM 

WHAT  IS  THE  PROBABILITY  Q(N)  THAT  A  TRACK  CHOSEN  AT  RANDOM.  HAS 
EXACTLY  N  OVERFLOW  RECORDS? 

THIS  PROBLEM  CAN  BE  TRE4TEC  AS  A  CLASSICAL  OCCUPANCY  PROBLEM 
(SEE  WILLIAM  FELLER,  AN  INTRODUCTION  TO  PROBABILITY  THEORY  AND 
ITS  APPLICATIONS,  WILEY,  1957,  P.  34 ) ,  WITH  A  SOLUTION  AS 
FOLLOWS: 

Q(N)*C(A,N)*( I  1/M) #*N )* ( ( 1-l/M) *P(A-N) ) 

WHERE  C( A,N )  IS  THE  NUMBER  OF  COMBINATIONS  OF  A  THINGS  TAKFN  N  AT 
A  TIME. 

HOWEVER,  THE  DESIRED  DISTRIBUTION  EXCLUDES  N-O,  THAT  IS, 
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HACKS  WHICH  HAVE  NO  OVERFLOW  CHAIN.  THE  PROBABILITY  PIN)  THAT 
AN  OVERFLOW  CHAIN  IS  OF  LENGTH  N  <N>0)  tS  THERFFORE: 


PIN)  *  GINI/tl-PIO) I 


END  OF  PHASE  II  USER  GUIDE 


PLEASE  SEND  CORRECTIONS,  COMMENTS,  AND  SUGGESTIONS  TO 

P.  J.  GWENS 
IBM  RESEARCH 
KC6/C24 

MCNTEREY  AND  COTTLE  RCACS 
SAN  JOSE,  CALIFORNIA  95114 
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GLOSSARY 


Definitions  of  some  selected  term  oologies  used  in  this  contract  report  are  given 
below  for  easy  reference: 

•  Access  Method  -  A  data  management  program  providing  the  retrieval,  update  and 

maintenance  function  for  a  data  set. 

•  Access  Strategy  -  The  order  and  the  access  methods  to  be  used  in  accessing  all  the 

relevant  records  requested  in  a  transaction. 

•  Allocation  -  The  assignment  of  records  to  specified  locations  of  storage. 

•  BUAM  (Basic  Direct  Access  Method)  -  A  specific  IBM  implementation  of  the  direct 

access  file  organization. 

•  BISAM  (Basic  Indexed  Sequential  Access  Method)  -  Two  modes  of  access  to  ISAM  data 

sets.  In  retrieval,  the  access  method  program  is  presented  with  a  record 
identifier  and  it  searches  through  indexes  to  retrieve  the  desired  record. 

In  insertion,  a  record  is  presented  to  the  program  and  it.  is  inserted  in 
logical  sequence  on  the  basis  of  its  identifier. 

•  Block  (Physical  Record)  -  A  series  of  physically  contiguous  characters  on  a 

physical  storage  device;  the  unit  of  information  transfer  from  peripheral 
devices  to  core  memory.  Blocks  may  contain  one  (unblocked)  or  more  logical 
records . 

•  Blocking  Factor  -  The  number  of  logical  records  per  (physical)  block. 

•  BSAM  (Basic  Sequential  Access  Method)  -  A  mode  of  accessing  a  SAM  file  organiza¬ 

tion  where  the  user  provides  any  required  buffering  and  deblocking. 

•  Bucket  -  In  the  direct  access  organization,  a  set  of  one  or  more  record  positions 

associated  with  a  particular  address.  Records  whose  identifier  (or  key)  trans¬ 
forms  to  this  address  will  be  stored  in  this  bucket  or  its  extensions. 
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•  Buffer  -  An  area  in  core  storage  set  aside  to  hold  the  contents  of  one  block  from 

a  physical  storage  device. 

■  Byte  -  A  generic  term  for  a  group  of  adjacent  binary  bits  used  as  a  unit.  Typical 
examples  are  8-bit  byte  and  6-bit  byte. 

•  Chain  -  A  means  of  interconnecting  a  series  of  information  units.  The  connection 

is  by  means  of  a  pointer  field  stored  in  one  unit  pointing  to  the  next  unit  in 
sequence. 

•  Channel  Program  -  A  program  to  be  executed  by  a  channel. 

•  Cluster  -  A  set  of  consecutive  keys  separated  at  both  ends  by  a  gap  from  other  keys. 

•  Clockworks  Model  -  A  computerized  simulation  model  in  which  complex  transactions 

are  described  in  terms  of  paths  through  a  processing  network  composed  of  queues 
before  specific  processing  stations.  The  lengths  of  the  processing  steps  and 
the  completion  times  for  processing  steps  on  a  primitive  transaction  are  deter¬ 
mined  in  terms  of  a  master  clock  which  records  simulated  elapsed  time. 

•  Collision  Length  -  In  a  key  to  address  transformation,  select  an  arbitrary  key 

as  a  starting  point.  The  maximum  number  of  keys  following  the  starting  key 
that  can  be  mapped  to  distinct  addresses  is  called  the  collision  length. 

•  Control  Unit  -  A  device  controlling  the  operation  of  I/O  devices  such  as  disks, 

tapes,  pointers,  etc. 

•  Core  Memory  -  The  section  of  memory  attached  directly  to  the  CPU.  The  CPU  can 

only  directly  address  data  and  instructions  stored  in  core  memory. 

•  CPU  (Central  Processing  Unit)  -  The  computer  section  that  provides  primary  inter¬ 

pretation  of  the  users'  programs. 

•  DASD  (Direct  Access  Storage  Device)  -  A  peripheral,  physical  storage  device, 

e.g.,  drum,  disk,  di  cell. 


VII 1-3 


Data  Base  -  The  otality  of  the  collected  data  items  in  an  installation. 

Data  Management  Program  -  A  program  in  the  Operating  System  that  assists  the  user 
in  accessing  and  managing  his  data  files. 

Data  Rate  -  The  speed  in  bytes  per  second  that  a  device  can  transmit  or  receive 

data. 

Data  Set  -  An  IBM  term  for  a  data  file  (and  in  the  case  of  ISAM,  its  associated 
index  and  overflow  areas). 

Debugging  -  The  process  of  detecting  and  correcting  errors  in  a  program. 

Density  of  the  Key  Set  -  The  ratio  of  the  number  of  existing  keys  to  the  total 
number  of  possible  keys  in  the  key  space. 

Domain  -  A  mathematical  concept  associated  with  all  the  possible  values  of  a  set. 

DUMP  -  The  act  of  printing  out  the  entire  contents  of  the  core  memory  for  error 
detection  by  invoking  a  system  dump  routine. 

Entity  -  A  distinguishable  object,  thing  or  event  on  which  information  is  recorded. 

Extent  -  A  collection  of  stored  data  items  that  are  both  homogeneous  and  contiguous. 
On  a  disk  storage  device,  an  extent  is  characterized  by  the  volume  number,  the 
starting  cylinder  and  the  number  of  cylinders  in  the  extent. 

Field  -  The  smallest  information-bearing  unit  that  may  be  queried  and  processed 
by  a  formatted  file  system. 

FORE1  (or  FSSM)  (File  Organization  Evaluation  Model  (or  File  Structure  Simulation 
Model))  -  An  off-line  equation  evaluation  model  for  simulating  the  complex  query 
and  update  transactions  typical  of  next  generation  formatted  file  systems.  (See 
Final  Report,  Contract  AF  30(602)-4088  for  details). 
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•  FOREM  II  (File  Organization  Evaluation  Model,  Phase  II)  -  An  off-line  clockworks 

model  for  simulating  the  complex  query  and  update  transactions  typical  of  next 
generation  formatted  file  systems. 

•  FORMS  (File  Organization  Modeling  System)  -  An  on-line,  combined  equation  evalua¬ 

tion-clockworks  model  for  simulating  primary  key  access  methods.  (See  User's 
Manual  complete/  under  Contract  AF  30602-69-C-0100  for  details.) 

■  Full  Track  Blocking  -  One  (physical)  block  per  track. 

•  Galois  Fieid  -  A  finite  field  defined  in  the  mathematical  sense. 

•  Identifier  -  A  group  of  fields  which  provides  unique  identification  for  a  record 

or  segment, 

•  Insertion  -  The  placement  of  a  new  record  into  a  file. 

•  Inverted  File  -  A  sequence  of  records  ordered  according  to  the  magnitude  of  the 

value  of  a  field  other  than  the  primary  key  field. 

•  I/O  Supervisor  -  A  control  program  for  I/O  operations. 

•  ISAM  (Indexed  Sequential  Access  Method)  -  A  specific  IBM  implementation  of  an 

indexed  sequential  primary  key  file  organization. 

•  JCL  (Job  Control  Language)  -  A  language  for  specifying  the  characteristics  of  a 

particular  program  to  an  IBM  Operating  System. 

•  Key -to -Address  Transformation  -  A  technique  of  converting  a  set  of  keys  into  a 

set  of  addresses  on  a  storage  device. 

•  Load  Factor ■ -  (See  Packing  Factor) 

•  Master  Segment  -  A  segment  that  appears  at  the  root  of  a  hierarchic  tree  structure 

for  a  logical  record.  The  master  segment  is  superior  to  all  periodic  segments 
and  appears  once  and  only  once  per  logical  record. 
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Master  Index  -  tn  ISAM,  any  index  level  above  the  cylinder  index. 


OS  (Operating  System)  -  An  IBM  supplied  system  of  programs  for  controlling  and 
assisting  the  execution  of  user  programs. 

Overflow  -  .Number  of  records  that  are  assigned  to  a  unit  of  storage  space  (a 
bucket,  a  track,  etc.)  exceed  the  capacity  of  that  space. 

1’ack  (Volume)  *  A  removable  set  of  disks  for  a  disk  storage  device. 

Packing  Factor  -  In  the  direct  access  organization,  the  percent  of  possible  record 
positions  that  are  occupied  by  data  records. 

Partition  -  Disjoint  subdivision  of  a  set  of  objects  into  smaller  subunits. 

Periodic  Segment  -  A  segment  that  appears  at  a  level  other  than  the  root  of  a 
hierarchic  tree  structure  for  a  logical  record.  There  will  be  a  variable 
number  of  periodic  segments  per  logical  record. 

Processing  Time  -  The  elapsed  CPU  time  for  accessing  and  processing  a  logical 
record . 

QISAM  (Queued  Indexed  Sequential  Access  Method)  -  Two  modes  of  access  to  ISAM  data 
sets.  In  retrieval,  the  access  method  program  is  presented  with  a  record 
identifier  and  it  searches  through  indexes  to  find  the  location  of  the  corres¬ 
ponding  record.  It  then  proceeds  to  access,  in  logical  sequence  with  automatic 
buffering  and  deblocking,  as  many  logical  records  as  the  user  requests.  Updating- 
in-place  is  performed  in  this  mode  by  the  user  requesting  that  the  modified 
block  in  core  be  written  back  over  the  corresponding  block  on  the  physical  storage 
device.  The  create  mode  is  used  to  create  and  load  an  ISAM  data  organization  onto 
physical  storage. 

QSAM  (Queued  Sequential  Access  Method)  -  A  mode  of  accessing  a  SAM  file  organiza¬ 
tion  where  the  data  management  program  provides  automatic  buffering  and  de¬ 
blocking. 

Query  -  The  process  of  accessing  a  desired  set  of  records  from  a  file. 
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Relation  -  The  mathematical  definition  of  a  set  which  is  a  subset  of  a  product 
space  Dj  '  I),  ...  x  and  whose  elements  are  of  the  form  of  ordered 
n- tuples  (dj ,  d,,  .  .  .  ,  d  ) . 

•Repeating  Segment  -  (Sec  Periodic  Segment) 

•  S\d  (Sequential  Access  Method)  -  A  specific  IBM  implementation  of  the  sequential 

file  organization. 

•  Secondary  Index  -  A  cross  reference  index  relating  the  values  of  any  non-kev  field 

to  either  the  primary  keys  or  the  addresses  of  the  corresponding  records. 

•Seek  lime  (Access  Motion  Time)  -  The  time  required  to  position  the  access  mechanism 
at  the  cylinder  containing  the  desired  record. 

•  Segment  -  A  specific  concatenation  of  fields  providing  a  description  of  the 

properties  of  a  particular  object  or  event. 

•  Trace  -  A  time  sequence  recording  of  the  occurrence  of  events. 

transaction  -  A  collection  of  several'  related  processing  actions  in  connection 
with  an  application  task. 

•  Update  -  The  change  or  modification  of  one  or  more  field  values  in  a  record 

already  in  the  file. 

•  \ariable  (Length)  Segment  -  A  segment  which  contains  a  variable  number  of 

characters . 

•  2314  -  A  specific  type  of  IBM  disk  device. 

•  2400  -  2,  3,  4.  5,  6  -  Various  models  of  IBM  tape  drives. 
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Model. 

(U)  A  new,  more  powerful  9000  FORTRAN  statement  model  (FOREM  II)  for  simulating 
the  effects  of  complex  file  organisations,  and  machine  configurations  on  efficiency 
and  response  times  in  a  formatted  file  query  end  update  environment .  ^ 
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